Muddiest point?
Grammar of Graphics
- Lots of new terminology this week
- Can you describe each of these terms?
- Frame
- Glyph
- Aesthetic
- Scale
- Guide
- Facet
Grammar of Graphics
- Wilkinson’s 2005 book The Grammer of Graphics (2nd Ed.)
provides the principles and philosophy
- the
ggplot2
R package implements this framework
- Goal: flexible tools for building rich, intuitive
graphics
- Data Visualization is critical to our goals for high quality
Exploratory Data Analysis (EDA)
- Examine the data source: variable types, coding,
missingness, summary statistics/plots, who/what/when/where/why/how data
were collected
- Discover features that influence may modeling
decisions: investigate potential outliers, consideration for
recoding variables (e.g., numeric data that’s functionally dichotomous),
evaluate correlation structure (e.g., autocorrelation, hierarchy,
spatial/temporal proximity)
- Address research questions: build intuition and
note preliminary observations/conclusions related to each research
question. Also, note observations that prompt you to refine your
research questions or add new questions to investigate

Glyphs and Data
In archaeology, a GLYPH is a symbol or “mark” used to impart
meaning:
“Aesthetic”?
Q: In general, what’s your intuition about the meaning of
aesthetic?
- How might you describe the aesthetic of a favorite coffee shop or
restaurant?
Data Glyph Properties: Aesthetics
Aesthetics are visual properties of a glyph.
- Aesthetics for points: location (x and y), shape, color, size,
transparency
- Each glyph has its own set of aesthetics.


Some Graphics Components
glyph
- The basic graphical unit often corresponding to one case.
- Other terms used include mark and symbol.
aesthetic
- a visual property of a glyph such as position, size, shape, color,
etc.
- may be mapped based on data values:
sex is mapped to color
- may be set to particular non-data related values:
color is black
scale
- A mapping that translates data values into aesthetics. For
example,
- A scatter plot of health risks may identify cigarette smoking:
- blue represents “No”
- red represents “Yes”
- A printed map of campus uses a scaled representation of distance
- a centimeter on the printed map represents 100 meters of distance on
campus
frame
- The position scale describing how data are mapped to the coordinate
system in use
- Quite often, the frame defines the x-axis and y-axis of a
2-dimensional cartesian plane
guide
- An indication for the human viewer of the scale. This allows the
viewer to translate aesthetics back into data values.
- For example,
- a legend makes explicit the meaning of Red & Blue points on the
chart
- a 1 cm length printed on a map to inform the reader that it
corresponds to 100 meters on campus
Scales
The relationship between the variable value and the value of the
aesthetic the variable is mapped to.
- Systolic Blood Pressure (SBP) has units of mmHg (millimeters of
mercury)
- Position on the x-axis measured in distance, e.g. inches.
The conversion from SBP to position is a scale.
- Smoker is “never”, “former”, “current”
- Color is red, green, blue, …
The conversion from Smoker to color is a scale.
Guides
Guide: an indication to a human viewer of what the scale is.


- Labels on faceted graphics

Facets
ggplot(data = Tmp, aes(x = sbp, y = dbp, color = smoker)) +
geom_point() +
facet_grid( ~ sex)

- x is determined by
sbp
and sex
- basically a separate frame for each
sex
- uses same x and y twice (or once for each facet)
Designing Graphics
Graphics are designed by the human expert (you!) in order to reveal
information that’s present in the data.
Design choices
- What kind of glyph, e.g. scatter, density, bar, … many others
- What variables constitute the frame. And some details:
- axis limits
- logarithmic axes, etc.
- What variables should be mapped to other aesthetics of the
glyph.
- Whether to facet and with what variable.
More details, …, e.g. setting of aesthetics to constants
Good and Bad Graphics
Remember …
Graphics are designed by the human expert (you!) in order to reveal
information that’s present in the data.
- choices depend on what information you want to convey.
- practice reading graphics and critique which ways of arranging thing
are better or worse.
- A basic principle is that a graphic is about comparison.
Good graphics:
- make it easy for people to perceive things that are similar and
things that are different.
- put the things to be compared in proximity to one another (e.g.,
“side-by-side”)
Critique this graphic…
- What sort of comparisons might you want to make?
- Do you find it easy or hard to make those comparisons?
- How might this graph be improved?
p + geom_point(aes(color = sex, size = smoker), alpha = .85)

Perception and Comparison
In roughly descending order of human ability to compare nearby
objects:
- Position
- Length
- Area
- Angle
- Shape (but only a very few different shapes)
- Color
Color is the most difficult…
- color gradients — we’re better at
- discrete colors — must be carefully selected.
- lots of people are color blind (1 in 12 men; 1 in 200
women)
For more, see:
Cleveland W. (1985). The elements of graphing data. Bell
Telephone Laboratories: Murray Hill, NJ.
Glyph-Ready Data
Glyph-ready data has this form:
- There is one row for each glyph to be drawn.
- The variables in that row are mapped to aesthetics of the glyph
(including position)
Glyph-ready data
Mapping of data to aesthetics
sbp is mapped to x position
dbp is mapped to y position
smoker is mapped to color
sex is mapped to shape
Scales determine details of translation from
variable is mapped to aesthetic
Layers – building up complex plots
Each layer may have its own data, glyphs, aesthetic mapping, etc.
ggplot(data = Tmp, aes(x = sbp, y = dbp, colour = sex)) +
geom_point() +
geom_smooth(se = FALSE)

- one layer has points
- another layer has the curves
Challenge Task
Source: “College, the Great Unleveler”, New York Times,
03-01-2014

Left Panel Questions:
- Left panel: What variables make up the frame?
- Left panel: What are the guides?
- Left panel: What are the glyphs and what do they
represent?
- Left panel: Sketch a few rows of the glyph-ready data.
Right Panel Questions:
- Right panel: What are the glyphs and what do they
represent?
- Right panel: Describe three aesthetics mapped to
the glyphs?
- Right panel: Sketch a few rows of the glyph-ready data.
- Right panel: Make a rough sketch of a stacked bar chart showing the
same information.
Partial Solution to Challenge Task
Left Panel Questions:
- Left panel frame: Fraction of family income to pay
for one year of college, and year.
- Left panel guides:
- Labels for the different quintiles of family income.
- A line scaling the axis, showing where the fraction of family income
is 100%.
- Text to label the extent of the horizontal axis, from 1971 to
2010.
- Left panel glyphs:
- Points representing fraction of income
- Lines connecting 1971 result to 2011 result
- Left panel: Sketch a few rows of the glyph-ready data.
Right Panel Questions:
- Right panel glyphs: segments of the circles
- Right panel three aesthetics: color, length, size
(i.e., radius of circle)
- Right panel: Sketch a few rows of the glyph-ready data.
- Right panel: Make a rough sketch of a stacked bar chart showing the
same information.
Path to success
Eye-training
- recognize and describe glyphs, aesthetics, scales, etc.
- identify data required for a plot
Data wrangling
- get data into glyph-ready format (
dplyr
,
tidyr
, tidyverse
)
- (we’ll start doing this next week!)
Graphics construction
- Newbies: match variables to aesthetics
interactively:
esquisse
package using esquisser( )
mosaic
package using mplot( )
- BOTH generate
ggplot2
syntax
- Pros: learn to write
ggplot2
code directly
Big Mac Price vs GPD per Capita?
Let’s use the esquisse
package to explore the data!
In the console:
esquisser(BigMac)
library(esquisse)
# use `esquisser( )` to draft a plot and then generate R code to put here!
Here’s an example:

- Using the graph, what can you say about the following?
- Frame
- Glyph
- Aesthetic
- Scale
- Guide
- Facet
Value of Big Mac around the world
# using `mWorldMap( )` from `mosaic` package
library(mosaic)
# `key` argument takes the ID variable; `fill` takes the measured variable
mWorldMap(BigMac, key = "iso_a3", fill = "dollar_price")
Mapping API still under development and may change in future releases.
Warning: 33 items were not translated

- Using the graph, what can you say about the following?
- Frame
- Glyph
- Aesthetic
- Scale
- Guide
- Facet
