Muddiest point?

Grammar of Graphics

Grammar of Graphics

  1. Address research questions: build intuition and note preliminary observations/conclusions related to each research question. Also, note observations that prompt you to refine your research questions or add new questions to investigate

Glyphs and Data

In archaeology, a GLYPH is a symbol or “mark” used to impart meaning:

HeiroGLYPH Mayan GLYPH
Heiroglyph Mayan glyph

Data Glyph

A data glyph is also a “mark” on a graph, e.g. 

The features of a data glyph encodes the value of variables.

  • Some are very simple, e.g. a dot:
  • Some combine different elements, e.g. a pointrange:
  • Some are more complex, e.g. a violin:

See: http://docs.ggplot2.org/current/

“Aesthetic”?

Q: In general, what’s your intuition about the meaning of aesthetic?

Data Glyph Properties: Aesthetics

Aesthetics are visual properties of a glyph.

Some Graphics Components

glyph

  • The basic graphical unit often corresponding to one case.
  • Other terms used include mark and symbol.

aesthetic

  • a visual property of a glyph such as position, size, shape, color, etc.
  • may be mapped based on data values: sex is mapped to color
  • may be set to particular non-data related values: color is black

scale

  • A mapping that translates data values into aesthetics. For example,
  • A scatter plot of health risks may identify cigarette smoking:
    • blue represents “No”
    • red represents “Yes”
  • A printed map of campus uses a scaled representation of distance
    • a centimeter on the printed map represents 100 meters of distance on campus

frame

  • The position scale describing how data are mapped to the coordinate system in use
  • Quite often, the frame defines the x-axis and y-axis of a 2-dimensional cartesian plane

guide

  • An indication for the human viewer of the scale. This allows the viewer to translate aesthetics back into data values.
  • For example,
    • a legend makes explicit the meaning of Red & Blue points on the chart
    • a 1 cm length printed on a map to inform the reader that it corresponds to 100 meters on campus

Scales

The relationship between the variable value and the value of the aesthetic the variable is mapped to.

  • Systolic Blood Pressure (SBP) has units of mmHg (millimeters of mercury)
  • Position on the x-axis measured in distance, e.g. inches.

The conversion from SBP to position is a scale.

  • Smoker is “never”, “former”, “current”
  • Color is red, green, blue, …

The conversion from Smoker to color is a scale.

Guides

Guide: an indication to a human viewer of what the scale is.

  • Axis ticks and numbers

  • Legends

  • Labels on faceted graphics

Facets

ggplot(data = Tmp, aes(x = sbp, y = dbp, color = smoker)) +
  geom_point() +
  facet_grid( ~ sex)

Designing Graphics

Graphics are designed by the human expert (you!) in order to reveal information that’s present in the data.

Design choices

  • What kind of glyph, e.g. scatter, density, bar, … many others
  • What variables constitute the frame. And some details:
    • axis limits
    • logarithmic axes, etc.
  • What variables should be mapped to other aesthetics of the glyph.
  • Whether to facet and with what variable.

More details, …, e.g. setting of aesthetics to constants

Good and Bad Graphics

Remember …

Graphics are designed by the human expert (you!) in order to reveal information that’s present in the data.

Critique this graphic…

p + geom_point(aes(color = sex, size = smoker), alpha = .85) 

Perception and Comparison

In roughly descending order of human ability to compare nearby objects:

  1. Position
  2. Length
  3. Area
  4. Angle
  5. Shape (but only a very few different shapes)
  6. Color

Color is the most difficult…

For more, see:

Cleveland W. (1985). The elements of graphing data. Bell Telephone Laboratories: Murray Hill, NJ.

Glyph-Ready Data

Glyph-ready data has this form:

Glyph-ready data

Mapping of data to aesthetics

   sbp is mapped to x position      
   dbp is mapped to y position    
smoker is mapped to color
   sex is mapped to shape

Scales determine details of translation from

variable is mapped to aesthetic

Layers – building up complex plots

Each layer may have its own data, glyphs, aesthetic mapping, etc.

ggplot(data = Tmp, aes(x = sbp, y = dbp, colour = sex)) +
  geom_point() +
  geom_smooth(se = FALSE) 

Stats: Data Transformations

ggplot(data = Tmp, aes(x = sbp)) +
  geom_histogram(binwidth = 10)

head(Tmp, 4)

Challenge Task

Source: “College, the Great Unleveler”, New York Times, 03-01-2014

Left Panel Questions:

  • Left panel: What variables make up the frame?
  • Left panel: What are the guides?
  • Left panel: What are the glyphs and what do they represent?
  • Left panel: Sketch a few rows of the glyph-ready data.

Right Panel Questions:

  • Right panel: What are the glyphs and what do they represent?
  • Right panel: Describe three aesthetics mapped to the glyphs?
  • Right panel: Sketch a few rows of the glyph-ready data.
  • Right panel: Make a rough sketch of a stacked bar chart showing the same information.

Partial Solution to Challenge Task

Left Panel Questions:

  • Left panel frame: Fraction of family income to pay for one year of college, and year.
  • Left panel guides:
    • Labels for the different quintiles of family income.
    • A line scaling the axis, showing where the fraction of family income is 100%.
    • Text to label the extent of the horizontal axis, from 1971 to 2010.
  • Left panel glyphs:
    • Points representing fraction of income
    • Lines connecting 1971 result to 2011 result
  • Left panel: Sketch a few rows of the glyph-ready data.

Right Panel Questions:

  • Right panel glyphs: segments of the circles
  • Right panel three aesthetics: color, length, size (i.e., radius of circle)
  • Right panel: Sketch a few rows of the glyph-ready data.
  • Right panel: Make a rough sketch of a stacked bar chart showing the same information.

Path to success

  1. Eye-training

    • recognize and describe glyphs, aesthetics, scales, etc.
    • identify data required for a plot
  2. Data wrangling

    • get data into glyph-ready format (dplyr, tidyr, tidyverse)
    • (we’ll start doing this next week!)
  3. Graphics construction

    • Newbies: match variables to aesthetics interactively:
      • esquisse package using esquisser( )
      • mosaic package using mplot( )
      • BOTH generate ggplot2 syntax
    • Pros: learn to write ggplot2 code directly

Revisiting The Big Mac Index

https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-12-22

# These data are available from the `tidytuesdayR` package 
# Install package from CRAN via: install.packages("tidytuesdayR")

library(tidytuesdayR)

TidyTuesData <- tidytuesdayR::tt_load(2020, week = 52)

    Downloading file 1 of 1: `big-mac.csv`
BigMac <- TidyTuesData[["big-mac"]]

Big Mac Price vs GPD per Capita?

Let’s use the esquisse package to explore the data!

In the console:

esquisser(BigMac)

library(esquisse)

# use `esquisser( )` to draft a plot and then generate R code to put here!

Here’s an example:

  • Using the graph, what can you say about the following?
    • Frame
    • Glyph
    • Aesthetic
    • Scale
    • Guide
    • Facet

Value of Big Mac around the world

# using `mWorldMap( )` from `mosaic` package
library(mosaic)

# `key` argument takes the ID variable; `fill` takes the measured variable 
mWorldMap(BigMac, key = "iso_a3", fill = "dollar_price")
Mapping API still under development and may change in future releases.
Warning: 33 items were not translated

