Muddiest point?
Grammar of Graphics
- Lots of new terminology this week
- Can you describe each of these terms?
- Frame
- Glyph
- Aesthetic
- Scale
- Guide
- Facet
Grammar of Graphics
- Wilkinson’s 2005 book The Grammer of Graphics (2nd Ed.)
provides the principles and philosophy
- the
ggplot2
R package implements this framework
- Goal: flexible tools for building rich, intuitive
graphics
- Data Visualization is critical to our goals for high quality
Exploratory Data Analysis (EDA)
- Examine the data source: variable types, coding,
missingness, summary statistics/plots, who/what/when/where/why/how data
were collected
- Discover features that influence may modeling
decisions: investigate potential outliers, consideration for
recoding variables (e.g., numeric data that’s functionally dichotomous),
evaluate correlation structure (e.g., autocorrelation, hierarchy,
spatial/temporal proximity)
- Address research questions: build intuition and
note preliminary observations/conclusions related to each research
question. Also, note observations that prompt you to refine your
research questions or add new questions to investigate

Glyphs and Data
In archaeology, a GLYPH is a symbol or “mark” used to impart
meaning:
“Aesthetic”?
Q: In general, what’s your intuition about the meaning of
aesthetic?
- How might you describe the aesthetic of a favorite coffee shop or
restaurant?
Data Glyph Properties: Aesthetics
Aesthetics are visual properties of a glyph.
- Aesthetics for points: location (x and y), shape, color, size,
transparency
- Each glyph has its own set of aesthetics.


Some Graphics Components
glyph
- The basic graphical unit often corresponding to one case.
- Other terms used include mark and symbol.
aesthetic
- a visual property of a glyph such as position, size, shape, color,
etc.
- may be mapped based on data values:
sex is mapped to color
- may be set to particular non-data related values:
color is black
scale
- A mapping that translates data values into aesthetics. For
example,
- A scatter plot of health risks may identify cigarette smoking:
- blue represents “No”
- red represents “Yes”
- A printed map of campus uses a scaled representation of distance
- a centimeter on the printed map represents 100 meters of distance on
campus
frame
- The position scale describing how data are mapped to the coordinate
system in use
- Quite often, the frame defines the x-axis and y-axis of a
2-dimensional cartesian plane
guide
- An indication for the human viewer of the scale. This allows the
viewer to translate aesthetics back into data values.
- For example,
- a legend makes explicit the meaning of Red & Blue points on the
chart
- a 1 cm length printed on a map to inform the reader that it
corresponds to 100 meters on campus
Scales
The relationship between the variable value and the value of the
aesthetic the variable is mapped to.
- Systolic Blood Pressure (SBP) has units of mmHg (millimeters of
mercury)
- Position on the x-axis measured in distance, e.g. inches.
The conversion from SBP to position is a scale.
- Smoker is “never”, “former”, “current”
- Color is red, green, blue, …
The conversion from Smoker to color is a scale.
Guides
Guide: an indication to a human viewer of what the scale is.


- Labels on faceted graphics

Facets
ggplot(data = Tmp, aes(x = sbp, y = dbp, color = smoker)) +
geom_point() +
facet_grid( ~ sex)

- x is determined by
sbp
and sex
- basically a separate frame for each
sex
- uses same x and y twice (or once for each facet)
Designing Graphics
Graphics are designed by the human expert (you!) in order to reveal
information that’s present in the data.
Design choices
- What kind of glyph, e.g. scatter, density, bar, … many others
- What variables constitute the frame. And some details:
- axis limits
- logarithmic axes, etc.
- What variables should be mapped to other aesthetics of the
glyph.
- Whether to facet and with what variable.
More details, …, e.g. setting of aesthetics to constants
Good and Bad Graphics
Remember …
Graphics are designed by the human expert (you!) in order to reveal
information that’s present in the data.
- choices depend on what information you want to convey.
- practice reading graphics and critique which ways of arranging thing
are better or worse.
- A basic principle is that a graphic is about comparison.
Good graphics:
- make it easy for people to perceive things that are similar and
things that are different.
- put the things to be compared in proximity to one another (e.g.,
“side-by-side”)
Critique this graphic…
- What sort of comparisons might you want to make?
- Do you find it easy or hard to make those comparisons?
- How might this graph be improved?
p + geom_point(aes(color = sex, size = smoker), alpha = .85)

Perception and Comparison
In roughly descending order of human ability to compare nearby
objects:
- Position
- Length
- Area
- Angle
- Shape (but only a very few different shapes)
- Color
Color is the most difficult…
- color gradients — we’re better at
- discrete colors — must be carefully selected.
- lots of people are color blind (1 in 12 men; 1 in 200
women)
For more, see:
Cleveland W. (1985). The elements of graphing data. Bell
Telephone Laboratories: Murray Hill, NJ.
Glyph-Ready Data
Glyph-ready data has this form:
- There is one row for each glyph to be drawn.
- The variables in that row are mapped to aesthetics of the glyph
(including position)
Glyph-ready data
Mapping of data to aesthetics
sbp is mapped to x position
dbp is mapped to y position
smoker is mapped to color
sex is mapped to shape
Scales determine details of translation from
variable is mapped to aesthetic
Layers – building up complex plots
Each layer may have its own data, glyphs, aesthetic mapping, etc.
ggplot(data = Tmp, aes(x = sbp, y = dbp, colour = sex)) +
geom_point() +
geom_smooth(se = FALSE)

- one layer has points
- another layer has the curves
Challenge Task
Source: “College, the Great Unleveler”, New York Times,
03-01-2014

Left Panel Questions:
- Left panel: What variables make up the frame?
- Left panel: What are the guides?
- Left panel: What are the glyphs and what do they
represent?
- Left panel: Sketch a few rows of the glyph-ready data.
Right Panel Questions:
- Right panel: What are the glyphs and what do they
represent?
- Right panel: Describe three aesthetics mapped to
the glyphs?
- Right panel: Sketch a few rows of the glyph-ready data.
- Right panel: Make a rough sketch of a stacked bar chart showing the
same information.
Partial Solution to Challenge Task
Left Panel Questions:
- Left panel frame: Fraction of family income to pay
for one year of college, and year.
- Left panel guides:
- Labels for the different quintiles of family income.
- A line scaling the axis, showing where the fraction of family income
is 100%.
- Text to label the extent of the horizontal axis, from 1971 to
2010.
- Left panel glyphs:
- Points representing fraction of income
- Lines connecting 1971 result to 2011 result
- Left panel: Sketch a few rows of the glyph-ready data.
Right Panel Questions:
- Right panel glyphs: segments of the circles
- Right panel three aesthetics: color, length, size
(i.e., radius of circle)
- Right panel: Sketch a few rows of the glyph-ready data.
- Right panel: Make a rough sketch of a stacked bar chart showing the
same information.
Path to success
Eye-training
- recognize and describe glyphs, aesthetics, scales, etc.
- identify data required for a plot
Data wrangling
- get data into glyph-ready format (
dplyr
,
tidyr
, tidyverse
)
- (we’ll start doing this next week!)
Graphics construction
- Newbies: match variables to aesthetics
interactively:
esquisse
package using esquisser( )
mosaic
package using mplot( )
- BOTH generate
ggplot2
syntax
- Pros: learn to write
ggplot2
code directly
Big Mac Price vs GPD per Capita?
Let’s use the esquisse
package to explore the data!
In the console:
esquisser(BigMac)
library(esquisse)
# use `esquisser( )` to draft a plot and then generate R code to put here!
Here’s an example:

- Using the graph, what can you say about the following?
- Frame
- Glyph
- Aesthetic
- Scale
- Guide
- Facet
Value of Big Mac around the world
# using `mWorldMap( )` from `mosaic` package
library(mosaic)
# `key` argument takes the ID variable; `fill` takes the measured variable
mWorldMap(BigMac, key = "iso_a3", fill = "dollar_price")
Mapping API still under development and may change in future releases.
Warning: 33 items were not translated

- Using the graph, what can you say about the following?
- Frame
- Glyph
- Aesthetic
- Scale
- Guide
- Facet
---
title: "Week 4: Graphics, Glyphs, Frames, and Scales"
subtitle: "Data Computing Chapters 5 & 6"
author: "Prof Beckman"
date: ""
output: 
    slidy_presentation: default
    html_notebook: default

---

```{r include=FALSE}
library(mosaic)
library(NHANES)
library(dcData)
library(tidyverse)
library(esquisse)

```

## Muddiest point?



## Grammar of Graphics

- Lots of new terminology this week
- Can you describe each of these terms?
    - Frame
    - Glyph
    - Aesthetic
    - Scale
    - Guide
    - Facet


## Grammar of Graphics

- Wilkinson's 2005 book *The Grammer of Graphics (2nd Ed.)* provides the principles and philosophy 
    - the `ggplot2` R package implements this framework
    - **Goal**: flexible tools for building rich, intuitive graphics

- Data Visualization is critical to our goals for high quality Exploratory Data Analysis (EDA)
    1. **Examine the data source:** variable types, coding, missingness, summary statistics/plots, who/what/when/where/why/how data were collected
    2. **Discover features that influence may modeling decisions:** investigate potential outliers, consideration for recoding variables (e.g., numeric data that's functionally dichotomous), evaluate correlation structure (e.g., autocorrelation, hierarchy, spatial/temporal proximity)
3. **Address research questions:** build intuition and note preliminary observations/conclusions related to each research question.  Also, note observations that prompt you to refine your research questions or add new questions to investigate

![](grammarGraphics.png)


## Glyphs and Data

In archaeology, a GLYPH is a symbol or "mark" used to impart meaning:

HeiroGLYPH | Mayan GLYPH
---------------|----------------:
![Heiroglyph](Images/hand.jpg) | ![Mayan glyph](Images/mayan-glyph.png) 

## Data Glyph


### A data glyph is also a "mark" on a graph, e.g. 

![](Images/geom_rect.png) ![](Images/geom_segment.png) ![](Images/geom_text.png) ![](Images/geom_crossbar.png) ![](Images/geom_path.png) ![](Images/geom_line.png) ![](Images/geom_pointrange.png) ![](Images/geom_ribbon.png) ![](Images/geom_point.png) ![](Images/geom_polygon.png) ![](Images/geom_histogram.png) ![](Images/geom_dotplot.png) ![](Images/geom_freqpoly.png) ![](Images/geom_density.png) ![](Images/geom_violin.png) 

The features of a data glyph encodes the value of variables. 

- Some are very simple, e.g. a dot: ![](Images/geom_point.png)
- Some combine different elements, e.g. a pointrange: ![](Images/geom_pointrange.png)
- Some are more complex, e.g. a violin: ![](Images/geom_violin.png)

See: *<http://docs.ggplot2.org/current/>*


## "Aesthetic"?

Q: In general, what's your intuition about the meaning of aesthetic?

- How might you describe the aesthetic of a favorite coffee shop or restaurant?


## Data Glyph Properties: Aesthetics

Aesthetics are **visual properties** of a glyph.

  * Aesthetics for points: location (x and y), shape, color, size, transparency
  * Each glyph has its own set of aesthetics.

```{r echo=FALSE, fig.keep='all', out.width="50%", include=FALSE}
set.seed(102)
n <- 30
Tmp <- data.frame(
  sbp =  round(runif(n, min = 80, max = 180)),
  dbp = round(runif(n, min = 40, max = 110)),
  group = sample(c("Tr","Ctl"), size = n, replace = TRUE),
  react = sample( c("Low", "Sev", "Mod"), size = n, replace = TRUE)
)
Tmp <- Tmp %>% mutate(dbp = pmin(sbp, dbp)) 
p <- ggplot(Tmp, aes(x = sbp, y = dbp)) + xlab("Systolic BP") + ylab("Diastolic BP")
p + geom_point(aes(color = group, size = react)) 
p + geom_point(size = 5, aes(shape = group, color = react))
```

```{r echo=FALSE, fig.keep='all', out.width="50%"}
set.seed(102)
require(NHANES)

n <- 75
Tmp <- 
  NHANES %>%
  mutate(
    smoker = derivedFactor(
      never = Smoke100 == "No",
      former = SmokeNow == "No",
      current = SmokeNow == "Yes",
      .ordered = TRUE
    ),
    sbp = BPSysAve,
    dbp = BPDiaAve,
    sex = Gender
  ) %>%
  select( sbp, dbp, sex, smoker ) %>%
  sample_n(n) %>%
  filter(complete.cases(.)) %>% 
  data.frame()


p <- ggplot(Tmp, aes(x = sbp, y = dbp)) + 
  xlab("Systolic BP") + ylab("Diastolic BP")
p + geom_point(aes(color = sex, size = smoker), alpha = .8) 
p + geom_point(size = 5, aes(shape = sex, color = smoker), alpha = .8)
```



## Some Graphics Components

#### **glyph**

- The basic graphical unit often corresponding to one case.
- Other terms used include *mark* and *symbol*. 

#### **aesthetic**

- a visual property of a glyph such as position, size, shape, color, etc.  
- may be **mapped** based on data values: `sex is mapped to color` 
- may be **set** to particular non-data related values: `color is black`

#### **scale**

* A mapping that translates data values into aesthetics.  For example,
* A scatter plot of health risks may identify cigarette smoking:  
    - <font color="blue">blue</font> represents "No"
    - <font color="red">red</font> represents "Yes"
* A printed map of campus uses a scaled representation of distance
    - a centimeter on the printed map represents 100 meters of distance on campus

#### **frame**

- The position scale describing how data are mapped to the coordinate system in use
- Quite often, the frame defines the x-axis and y-axis of a 2-dimensional cartesian plane


#### **guide**

* An indication for the human viewer of the scale.  This allows the viewer to translate aesthetics back into data values.
* For example, 
    - a legend makes explicit the meaning of Red & Blue points on the chart
    - a 1 cm length printed on a map to inform the reader that it corresponds to 100 meters on campus


## Scales

#### The relationship between the variable value and the value of the aesthetic the variable is mapped to.

* Systolic Blood Pressure (SBP) has units of mmHg (millimeters of mercury)
* Position on the x-axis measured in distance, e.g. inches.

The conversion from SBP to position is a *scale*.

* Smoker is "never", "former", "current"
* Color is red, green, blue, ...

The conversion from Smoker to color is a *scale*.


## Guides

#### Guide: an indication to a human viewer of what the scale is.

* Axis ticks and numbers

![](Images/x-axis-scale.png)

* Legends

![](Images/color-scale.png) 
![](Images/shape-scale.png) 


* Labels on faceted graphics

![](Images/facet-scale.png)

## Facets 

```{r fig.height=3}
ggplot(data = Tmp, aes(x = sbp, y = dbp, color = smoker)) +
  geom_point() +
  facet_grid( ~ sex)
```

 * x is determined by `sbp` and `sex`
 * basically a separate frame for each `sex`
 * uses same x and y twice (or once for each facet)
 

## Designing Graphics

Graphics are designed by the human expert (you!) in order to reveal information that's present in the data.

#### Design choices


* What kind of glyph, e.g. scatter, density, bar, ... many others
* What variables constitute the frame. And some details:
    - axis limits
    - logarithmic axes, etc.
* What variables should be mapped to other aesthetics of the glyph.
* Whether to facet and with what variable.

More details, ..., e.g. setting of aesthetics to constants


## Good and Bad Graphics

Remember ... 

> Graphics are designed by the human expert (you!) in order to reveal information that's present in the data.

- choices depend on what information you want to convey.
- practice reading graphics and critique which ways of arranging thing are better or worse.
- A basic principle is that a graphic is about *comparison*.  Good graphics: 
    - make it easy for people to perceive things that are similar and things that are different.  
    - put the things to be compared in proximity to one another (e.g., "side-by-side") 



## Critique this graphic...

- What sort of comparisons might you want to make?
- Do you find it easy or hard to make those comparisons?
- How might this graph be improved?

```{r}
p + geom_point(aes(color = sex, size = smoker), alpha = .85) 
```

## Perception and Comparison

In roughly descending order of human ability to compare nearby objects:

1. Position
2. Length
3. Area
4. Angle
5. Shape (but only a very few different shapes)
6. Color

Color is the most difficult...    

  - color gradients --- we're better at
  - discrete colors --- must be carefully selected.
  - lots of people are color blind ([1 in 12 men; 1 in 200 women](http://www.colourblindawareness.org/))
  
For more, see:

Cleveland W. (1985). *The elements of graphing data*. Bell Telephone Laboratories: Murray Hill, NJ.


## Glyph-Ready Data

Glyph-ready data has this form:

  * There is one row for each glyph to be drawn. 
  * The variables in that row are mapped to aesthetics of the glyph (including position)


<div class="columns-2">
**Glyph-ready data**
```{r echo=FALSE}
head(Tmp,6)
```

**Mapping of data to aesthetics**

```
   sbp is mapped to x position      
   dbp is mapped to y position    
smoker is mapped to color
   sex is mapped to shape
```

Scales determine details of translation from

`variable is mapped to aesthetic` 

```{r include=FALSE}
Tmp2 <- Tmp %>%
  rename(x = sbp, y = dbp, color = smoker, shape = sex )
head(Tmp2)
```
</div>

<!--
It's as if the variables were given the name of the aesthetic.
-->



## Layers -- building up complex plots 

Each layer may have its own data, glyphs, aesthetic mapping, etc.

```{r}
ggplot(data = Tmp, aes(x = sbp, y = dbp, colour = sex)) +
  geom_point() +
  geom_smooth(se = FALSE) 
```

 * one layer has points
 * another layer has the curves

## Stats: Data Transformations

```{r}
ggplot(data = Tmp, aes(x = sbp)) +
  geom_histogram(binwidth = 10)
```

  * What are the glyphs, aesthetics, guides, etc. for this plot?
  * How is the data for this plot related to the "raw" data?
  
```{r}
head(Tmp, 4)
```

## Challenge Task

Source: “College, the Great Unleveler”, New York Times, 03-01-2014 

![](images/Challenge.png)

#### Left Panel Questions: 
- Left panel: What variables make up the **frame**?
- Left panel: What are the **guides**?
- Left panel: What are the **glyphs** and what do they represent?
- Left panel: Sketch a few rows of the glyph-ready data.

#### Right Panel Questions:
- Right panel: What are the **glyphs** and what do they represent?
- Right panel: Describe three **aesthetics** mapped to the glyphs?
- Right panel: Sketch a few rows of the glyph-ready data.
- Right panel: Make a rough sketch of a stacked bar chart showing the same information.

## Partial Solution to Challenge Task

#### Left Panel Questions: 
- Left panel **frame**: Fraction of family income to pay for one year of college, and year. 
- Left panel **guides**: 
    - Labels for the different quintiles of family income.
    - A line scaling the axis, showing where the fraction of family income is 100%.
    - Text to label the extent of the horizontal axis, from 1971 to 2010.
- Left panel **glyphs**: 
    - Points representing fraction of income
    - Lines connecting 1971 result to 2011 result 
- Left panel: Sketch a few rows of the glyph-ready data.


#### Right Panel Questions:
- Right panel **glyphs**: segments of the circles
- Right panel three **aesthetics**: color, length, size (i.e., radius of circle)
- Right panel: Sketch a few rows of the glyph-ready data.
- Right panel: Make a rough sketch of a stacked bar chart showing the same information.


## Path to success

 1. **Eye-training** 
 
    - recognize and describe glyphs, aesthetics, scales, etc.
    - identify data required for a plot
    
 2. **Data wrangling** 
 
    - get data into glyph-ready format (`dplyr`, `tidyr`, `tidyverse`)
    - (we'll start doing this next week!)
    
 3. **Graphics construction** 
    
    - Newbies: match variables to aesthetics **interactively**: 
        - `esquisse` package using `esquisser( )`  
        - `mosaic` package using `mplot( )` 
        - BOTH generate `ggplot2` syntax 
    - Pros: learn to write `ggplot2` code directly


<!--  - `DataComputing` package using `scatterGraphHelper( )`, `barGraphHelper( )`, `densityGraphHelper( )` -->


## Revisiting The Big Mac Index

<https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-12-22>

```{r message=FALSE}
# These data are available from the `tidytuesdayR` package 
# Install package from CRAN via: install.packages("tidytuesdayR")

library(tidytuesdayR)

TidyTuesData <- tidytuesdayR::tt_load(2020, week = 52)
BigMac <- TidyTuesData[["big-mac"]]

```


## Big Mac Price vs GPD per Capita?

Let's use the `esquisse` package to explore the data!

In the console: 

`esquisser(BigMac)`


```{r}
library(esquisse)

# use `esquisser( )` to draft a plot and then generate R code to put here!

```


### Here's an example:

```{r echo=FALSE}
ggplot(BigMac) +
 aes(x = gdp_dollar, y = dollar_price) +
 geom_point(size = 1L, colour = "#0c4c8a") +
 labs(title = "Big Mac Price Around the World vs GDP per Capita") +
 theme_minimal()
```

- Using the graph, what can you say about the following?
  - Frame  
  - Glyph
  - Aesthetic
  - Scale
  - Guide
  - Facet


## Value of Big Mac around the world

```{r}
# using `mWorldMap( )` from `mosaic` package
library(mosaic)

# `key` argument takes the ID variable; `fill` takes the measured variable 
mWorldMap(BigMac, key = "iso_a3", fill = "dollar_price")
```

- Using the graph, what can you say about the following?
  - Frame  
  - Glyph
  - Aesthetic
  - Scale
  - Guide
  - Facet





<!-- ## Activity:  -->

<!-- You're going to practice reproducing graphs using the interactive R functions introduced in the reading:   -->


<!-- #### Homework   -->

<!-- **All homework, activities, assignments etc. from now on must be submitted to Canvas as HTML files with embedded .Rmd unless otherwise stated.**   -->

<!-- - Turn in Graph Replication Activity (HTML to Canvas) -->
<!-- - DC Ch 5 & 6 Exercises (HTML to Canvas): 5.1, 5.2, 6.5, 6.6, 6.7, 6.8   -->
<!-- - DC chapter 7 reading quiz on Canvas     -->

