Muddiest point?

Grammar of Graphics

Grammar of Graphics

  1. Address research questions: build intuition and note preliminary observations/conclusions related to each research question. Also, note observations that prompt you to refine your research questions or add new questions to investigate

Glyphs and Data

In archaeology, a GLYPH is a symbol or “mark” used to impart meaning:

HeiroGLYPH Mayan GLYPH
Heiroglyph Mayan glyph

Data Glyph

A data glyph is also a “mark” on a graph, e.g. 

The features of a data glyph encodes the value of variables.

  • Some are very simple, e.g. a dot:
  • Some combine different elements, e.g. a pointrange:
  • Some are more complex, e.g. a violin:

See: http://docs.ggplot2.org/current/

“Aesthetic”?

Q: In general, what’s your intuition about the meaning of aesthetic?

Data Glyph Properties: Aesthetics

Aesthetics are visual properties of a glyph.

Some Graphics Components

glyph

  • The basic graphical unit often corresponding to one case.
  • Other terms used include mark and symbol.

aesthetic

  • a visual property of a glyph such as position, size, shape, color, etc.
  • may be mapped based on data values: sex is mapped to color
  • may be set to particular non-data related values: color is black

scale

  • A mapping that translates data values into aesthetics. For example,
  • A scatter plot of health risks may identify cigarette smoking:
    • blue represents “No”
    • red represents “Yes”
  • A printed map of campus uses a scaled representation of distance
    • a centimeter on the printed map represents 100 meters of distance on campus

frame

  • The position scale describing how data are mapped to the coordinate system in use
  • Quite often, the frame defines the x-axis and y-axis of a 2-dimensional cartesian plane

guide

  • An indication for the human viewer of the scale. This allows the viewer to translate aesthetics back into data values.
  • For example,
    • a legend makes explicit the meaning of Red & Blue points on the chart
    • a 1 cm length printed on a map to inform the reader that it corresponds to 100 meters on campus

Scales

The relationship between the variable value and the value of the aesthetic the variable is mapped to.

  • Systolic Blood Pressure (SBP) has units of mmHg (millimeters of mercury)
  • Position on the x-axis measured in distance, e.g. inches.

The conversion from SBP to position is a scale.

  • Smoker is “never”, “former”, “current”
  • Color is red, green, blue, …

The conversion from Smoker to color is a scale.

Guides

Guide: an indication to a human viewer of what the scale is.

  • Axis ticks and numbers

  • Legends

  • Labels on faceted graphics

Facets

ggplot(data = Tmp, aes(x = sbp, y = dbp, color = smoker)) +
  geom_point() +
  facet_grid( ~ sex)

Designing Graphics

Graphics are designed by the human expert (you!) in order to reveal information that’s present in the data.

Design choices

  • What kind of glyph, e.g. scatter, density, bar, … many others
  • What variables constitute the frame. And some details:
    • axis limits
    • logarithmic axes, etc.
  • What variables should be mapped to other aesthetics of the glyph.
  • Whether to facet and with what variable.

More details, …, e.g. setting of aesthetics to constants

Good and Bad Graphics

Remember …

Graphics are designed by the human expert (you!) in order to reveal information that’s present in the data.

Critique this graphic…

p + geom_point(aes(color = sex, size = smoker), alpha = .85) 

Perception and Comparison

In roughly descending order of human ability to compare nearby objects:

  1. Position
  2. Length
  3. Area
  4. Angle
  5. Shape (but only a very few different shapes)
  6. Color

Color is the most difficult…

For more, see:

Cleveland W. (1985). The elements of graphing data. Bell Telephone Laboratories: Murray Hill, NJ.

Glyph-Ready Data

Glyph-ready data has this form:

Glyph-ready data

Mapping of data to aesthetics

   sbp is mapped to x position      
   dbp is mapped to y position    
smoker is mapped to color
   sex is mapped to shape

Scales determine details of translation from

variable is mapped to aesthetic

Layers – building up complex plots

Each layer may have its own data, glyphs, aesthetic mapping, etc.

ggplot(data = Tmp, aes(x = sbp, y = dbp, colour = sex)) +
  geom_point() +
  geom_smooth(se = FALSE) 

Stats: Data Transformations

ggplot(data = Tmp, aes(x = sbp)) +
  geom_histogram(binwidth = 10)

head(Tmp, 4)

Challenge Task

Source: “College, the Great Unleveler”, New York Times, 03-01-2014

Left Panel Questions:

  • Left panel: What variables make up the frame?
  • Left panel: What are the guides?
  • Left panel: What are the glyphs and what do they represent?
  • Left panel: Sketch a few rows of the glyph-ready data.

Right Panel Questions:

  • Right panel: What are the glyphs and what do they represent?
  • Right panel: Describe three aesthetics mapped to the glyphs?
  • Right panel: Sketch a few rows of the glyph-ready data.
  • Right panel: Make a rough sketch of a stacked bar chart showing the same information.

Partial Solution to Challenge Task

Left Panel Questions:

  • Left panel frame: Fraction of family income to pay for one year of college, and year.
  • Left panel guides:
    • Labels for the different quintiles of family income.
    • A line scaling the axis, showing where the fraction of family income is 100%.
    • Text to label the extent of the horizontal axis, from 1971 to 2010.
  • Left panel glyphs:
    • Points representing fraction of income
    • Lines connecting 1971 result to 2011 result
  • Left panel: Sketch a few rows of the glyph-ready data.

Right Panel Questions:

  • Right panel glyphs: segments of the circles
  • Right panel three aesthetics: color, length, size (i.e., radius of circle)
  • Right panel: Sketch a few rows of the glyph-ready data.
  • Right panel: Make a rough sketch of a stacked bar chart showing the same information.

Path to success

  1. Eye-training

    • recognize and describe glyphs, aesthetics, scales, etc.
    • identify data required for a plot
  2. Data wrangling

    • get data into glyph-ready format (dplyr, tidyr, tidyverse)
    • (we’ll start doing this next week!)
  3. Graphics construction

    • Newbies: match variables to aesthetics interactively:
      • esquisse package using esquisser( )
      • mosaic package using mplot( )
      • BOTH generate ggplot2 syntax
    • Pros: learn to write ggplot2 code directly

Revisiting The Big Mac Index

https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-12-22

# These data are available from the `tidytuesdayR` package 
# Install package from CRAN via: install.packages("tidytuesdayR")

library(tidytuesdayR)

TidyTuesData <- tidytuesdayR::tt_load(2020, week = 52)

    Downloading file 1 of 1: `big-mac.csv`
BigMac <- TidyTuesData[["big-mac"]]

Big Mac Price vs GPD per Capita?

Let’s use the esquisse package to explore the data!

In the console:

esquisser(BigMac)

library(esquisse)

# use `esquisser( )` to draft a plot and then generate R code to put here!

Here’s an example:

  • Using the graph, what can you say about the following?
    • Frame
    • Glyph
    • Aesthetic
    • Scale
    • Guide
    • Facet

Value of Big Mac around the world

# using `mWorldMap( )` from `mosaic` package
library(mosaic)

# `key` argument takes the ID variable; `fill` takes the measured variable 
mWorldMap(BigMac, key = "iso_a3", fill = "dollar_price")
Mapping API still under development and may change in future releases.
Warning: 33 items were not translated

---
title: "Week 4: Graphics, Glyphs, Frames, and Scales"
subtitle: "Data Computing Chapters 5 & 6"
author: "Prof Beckman"
date: ""
output: 
    slidy_presentation: default
    html_notebook: default

---

```{r include=FALSE}
library(mosaic)
library(NHANES)
library(dcData)
library(tidyverse)
library(esquisse)

```

## Muddiest point?



## Grammar of Graphics

- Lots of new terminology this week
- Can you describe each of these terms?
    - Frame
    - Glyph
    - Aesthetic
    - Scale
    - Guide
    - Facet


## Grammar of Graphics

- Wilkinson's 2005 book *The Grammer of Graphics (2nd Ed.)* provides the principles and philosophy 
    - the `ggplot2` R package implements this framework
    - **Goal**: flexible tools for building rich, intuitive graphics

- Data Visualization is critical to our goals for high quality Exploratory Data Analysis (EDA)
    1. **Examine the data source:** variable types, coding, missingness, summary statistics/plots, who/what/when/where/why/how data were collected
    2. **Discover features that influence may modeling decisions:** investigate potential outliers, consideration for recoding variables (e.g., numeric data that's functionally dichotomous), evaluate correlation structure (e.g., autocorrelation, hierarchy, spatial/temporal proximity)
3. **Address research questions:** build intuition and note preliminary observations/conclusions related to each research question.  Also, note observations that prompt you to refine your research questions or add new questions to investigate

![](grammarGraphics.png)


## Glyphs and Data

In archaeology, a GLYPH is a symbol or "mark" used to impart meaning:

HeiroGLYPH | Mayan GLYPH
---------------|----------------:
![Heiroglyph](Images/hand.jpg) | ![Mayan glyph](Images/mayan-glyph.png) 

## Data Glyph


### A data glyph is also a "mark" on a graph, e.g. 

![](Images/geom_rect.png) ![](Images/geom_segment.png) ![](Images/geom_text.png) ![](Images/geom_crossbar.png) ![](Images/geom_path.png) ![](Images/geom_line.png) ![](Images/geom_pointrange.png) ![](Images/geom_ribbon.png) ![](Images/geom_point.png) ![](Images/geom_polygon.png) ![](Images/geom_histogram.png) ![](Images/geom_dotplot.png) ![](Images/geom_freqpoly.png) ![](Images/geom_density.png) ![](Images/geom_violin.png) 

The features of a data glyph encodes the value of variables. 

- Some are very simple, e.g. a dot: ![](Images/geom_point.png)
- Some combine different elements, e.g. a pointrange: ![](Images/geom_pointrange.png)
- Some are more complex, e.g. a violin: ![](Images/geom_violin.png)

See: *<http://docs.ggplot2.org/current/>*


## "Aesthetic"?

Q: In general, what's your intuition about the meaning of aesthetic?

- How might you describe the aesthetic of a favorite coffee shop or restaurant?


## Data Glyph Properties: Aesthetics

Aesthetics are **visual properties** of a glyph.

  * Aesthetics for points: location (x and y), shape, color, size, transparency
  * Each glyph has its own set of aesthetics.

```{r echo=FALSE, fig.keep='all', out.width="50%", include=FALSE}
set.seed(102)
n <- 30
Tmp <- data.frame(
  sbp =  round(runif(n, min = 80, max = 180)),
  dbp = round(runif(n, min = 40, max = 110)),
  group = sample(c("Tr","Ctl"), size = n, replace = TRUE),
  react = sample( c("Low", "Sev", "Mod"), size = n, replace = TRUE)
)
Tmp <- Tmp %>% mutate(dbp = pmin(sbp, dbp)) 
p <- ggplot(Tmp, aes(x = sbp, y = dbp)) + xlab("Systolic BP") + ylab("Diastolic BP")
p + geom_point(aes(color = group, size = react)) 
p + geom_point(size = 5, aes(shape = group, color = react))
```

```{r echo=FALSE, fig.keep='all', out.width="50%"}
set.seed(102)
require(NHANES)

n <- 75
Tmp <- 
  NHANES %>%
  mutate(
    smoker = derivedFactor(
      never = Smoke100 == "No",
      former = SmokeNow == "No",
      current = SmokeNow == "Yes",
      .ordered = TRUE
    ),
    sbp = BPSysAve,
    dbp = BPDiaAve,
    sex = Gender
  ) %>%
  select( sbp, dbp, sex, smoker ) %>%
  sample_n(n) %>%
  filter(complete.cases(.)) %>% 
  data.frame()


p <- ggplot(Tmp, aes(x = sbp, y = dbp)) + 
  xlab("Systolic BP") + ylab("Diastolic BP")
p + geom_point(aes(color = sex, size = smoker), alpha = .8) 
p + geom_point(size = 5, aes(shape = sex, color = smoker), alpha = .8)
```



## Some Graphics Components

#### **glyph**

- The basic graphical unit often corresponding to one case.
- Other terms used include *mark* and *symbol*. 

#### **aesthetic**

- a visual property of a glyph such as position, size, shape, color, etc.  
- may be **mapped** based on data values: `sex is mapped to color` 
- may be **set** to particular non-data related values: `color is black`

#### **scale**

* A mapping that translates data values into aesthetics.  For example,
* A scatter plot of health risks may identify cigarette smoking:  
    - <font color="blue">blue</font> represents "No"
    - <font color="red">red</font> represents "Yes"
* A printed map of campus uses a scaled representation of distance
    - a centimeter on the printed map represents 100 meters of distance on campus

#### **frame**

- The position scale describing how data are mapped to the coordinate system in use
- Quite often, the frame defines the x-axis and y-axis of a 2-dimensional cartesian plane


#### **guide**

* An indication for the human viewer of the scale.  This allows the viewer to translate aesthetics back into data values.
* For example, 
    - a legend makes explicit the meaning of Red & Blue points on the chart
    - a 1 cm length printed on a map to inform the reader that it corresponds to 100 meters on campus


## Scales

#### The relationship between the variable value and the value of the aesthetic the variable is mapped to.

* Systolic Blood Pressure (SBP) has units of mmHg (millimeters of mercury)
* Position on the x-axis measured in distance, e.g. inches.

The conversion from SBP to position is a *scale*.

* Smoker is "never", "former", "current"
* Color is red, green, blue, ...

The conversion from Smoker to color is a *scale*.


## Guides

#### Guide: an indication to a human viewer of what the scale is.

* Axis ticks and numbers

![](Images/x-axis-scale.png)

* Legends

![](Images/color-scale.png) 
![](Images/shape-scale.png) 


* Labels on faceted graphics

![](Images/facet-scale.png)

## Facets 

```{r fig.height=3}
ggplot(data = Tmp, aes(x = sbp, y = dbp, color = smoker)) +
  geom_point() +
  facet_grid( ~ sex)
```

 * x is determined by `sbp` and `sex`
 * basically a separate frame for each `sex`
 * uses same x and y twice (or once for each facet)
 

## Designing Graphics

Graphics are designed by the human expert (you!) in order to reveal information that's present in the data.

#### Design choices


* What kind of glyph, e.g. scatter, density, bar, ... many others
* What variables constitute the frame. And some details:
    - axis limits
    - logarithmic axes, etc.
* What variables should be mapped to other aesthetics of the glyph.
* Whether to facet and with what variable.

More details, ..., e.g. setting of aesthetics to constants


## Good and Bad Graphics

Remember ... 

> Graphics are designed by the human expert (you!) in order to reveal information that's present in the data.

- choices depend on what information you want to convey.
- practice reading graphics and critique which ways of arranging thing are better or worse.
- A basic principle is that a graphic is about *comparison*.  Good graphics: 
    - make it easy for people to perceive things that are similar and things that are different.  
    - put the things to be compared in proximity to one another (e.g., "side-by-side") 



## Critique this graphic...

- What sort of comparisons might you want to make?
- Do you find it easy or hard to make those comparisons?
- How might this graph be improved?

```{r}
p + geom_point(aes(color = sex, size = smoker), alpha = .85) 
```

## Perception and Comparison

In roughly descending order of human ability to compare nearby objects:

1. Position
2. Length
3. Area
4. Angle
5. Shape (but only a very few different shapes)
6. Color

Color is the most difficult...    

  - color gradients --- we're better at
  - discrete colors --- must be carefully selected.
  - lots of people are color blind ([1 in 12 men; 1 in 200 women](http://www.colourblindawareness.org/))
  
For more, see:

Cleveland W. (1985). *The elements of graphing data*. Bell Telephone Laboratories: Murray Hill, NJ.


## Glyph-Ready Data

Glyph-ready data has this form:

  * There is one row for each glyph to be drawn. 
  * The variables in that row are mapped to aesthetics of the glyph (including position)


<div class="columns-2">
**Glyph-ready data**
```{r echo=FALSE}
head(Tmp,6)
```

**Mapping of data to aesthetics**

```
   sbp is mapped to x position      
   dbp is mapped to y position    
smoker is mapped to color
   sex is mapped to shape
```

Scales determine details of translation from

`variable is mapped to aesthetic` 

```{r include=FALSE}
Tmp2 <- Tmp %>%
  rename(x = sbp, y = dbp, color = smoker, shape = sex )
head(Tmp2)
```
</div>

<!--
It's as if the variables were given the name of the aesthetic.
-->



## Layers -- building up complex plots 

Each layer may have its own data, glyphs, aesthetic mapping, etc.

```{r}
ggplot(data = Tmp, aes(x = sbp, y = dbp, colour = sex)) +
  geom_point() +
  geom_smooth(se = FALSE) 
```

 * one layer has points
 * another layer has the curves

## Stats: Data Transformations

```{r}
ggplot(data = Tmp, aes(x = sbp)) +
  geom_histogram(binwidth = 10)
```

  * What are the glyphs, aesthetics, guides, etc. for this plot?
  * How is the data for this plot related to the "raw" data?
  
```{r}
head(Tmp, 4)
```

## Challenge Task

Source: “College, the Great Unleveler”, New York Times, 03-01-2014 

![](images/Challenge.png)

#### Left Panel Questions: 
- Left panel: What variables make up the **frame**?
- Left panel: What are the **guides**?
- Left panel: What are the **glyphs** and what do they represent?
- Left panel: Sketch a few rows of the glyph-ready data.

#### Right Panel Questions:
- Right panel: What are the **glyphs** and what do they represent?
- Right panel: Describe three **aesthetics** mapped to the glyphs?
- Right panel: Sketch a few rows of the glyph-ready data.
- Right panel: Make a rough sketch of a stacked bar chart showing the same information.

## Partial Solution to Challenge Task

#### Left Panel Questions: 
- Left panel **frame**: Fraction of family income to pay for one year of college, and year. 
- Left panel **guides**: 
    - Labels for the different quintiles of family income.
    - A line scaling the axis, showing where the fraction of family income is 100%.
    - Text to label the extent of the horizontal axis, from 1971 to 2010.
- Left panel **glyphs**: 
    - Points representing fraction of income
    - Lines connecting 1971 result to 2011 result 
- Left panel: Sketch a few rows of the glyph-ready data.


#### Right Panel Questions:
- Right panel **glyphs**: segments of the circles
- Right panel three **aesthetics**: color, length, size (i.e., radius of circle)
- Right panel: Sketch a few rows of the glyph-ready data.
- Right panel: Make a rough sketch of a stacked bar chart showing the same information.


## Path to success

 1. **Eye-training** 
 
    - recognize and describe glyphs, aesthetics, scales, etc.
    - identify data required for a plot
    
 2. **Data wrangling** 
 
    - get data into glyph-ready format (`dplyr`, `tidyr`, `tidyverse`)
    - (we'll start doing this next week!)
    
 3. **Graphics construction** 
    
    - Newbies: match variables to aesthetics **interactively**: 
        - `esquisse` package using `esquisser( )`  
        - `mosaic` package using `mplot( )` 
        - BOTH generate `ggplot2` syntax 
    - Pros: learn to write `ggplot2` code directly


<!--  - `DataComputing` package using `scatterGraphHelper( )`, `barGraphHelper( )`, `densityGraphHelper( )` -->


## Revisiting The Big Mac Index

<https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-12-22>

```{r message=FALSE}
# These data are available from the `tidytuesdayR` package 
# Install package from CRAN via: install.packages("tidytuesdayR")

library(tidytuesdayR)

TidyTuesData <- tidytuesdayR::tt_load(2020, week = 52)
BigMac <- TidyTuesData[["big-mac"]]

```


## Big Mac Price vs GPD per Capita?

Let's use the `esquisse` package to explore the data!

In the console: 

`esquisser(BigMac)`


```{r}
library(esquisse)

# use `esquisser( )` to draft a plot and then generate R code to put here!

```


### Here's an example:

```{r echo=FALSE}
ggplot(BigMac) +
 aes(x = gdp_dollar, y = dollar_price) +
 geom_point(size = 1L, colour = "#0c4c8a") +
 labs(title = "Big Mac Price Around the World vs GDP per Capita") +
 theme_minimal()
```

- Using the graph, what can you say about the following?
  - Frame  
  - Glyph
  - Aesthetic
  - Scale
  - Guide
  - Facet


## Value of Big Mac around the world

```{r}
# using `mWorldMap( )` from `mosaic` package
library(mosaic)

# `key` argument takes the ID variable; `fill` takes the measured variable 
mWorldMap(BigMac, key = "iso_a3", fill = "dollar_price")
```

- Using the graph, what can you say about the following?
  - Frame  
  - Glyph
  - Aesthetic
  - Scale
  - Guide
  - Facet





<!-- ## Activity:  -->

<!-- You're going to practice reproducing graphs using the interactive R functions introduced in the reading:   -->


<!-- #### Homework   -->

<!-- **All homework, activities, assignments etc. from now on must be submitted to Canvas as HTML files with embedded .Rmd unless otherwise stated.**   -->

<!-- - Turn in Graph Replication Activity (HTML to Canvas) -->
<!-- - DC Ch 5 & 6 Exercises (HTML to Canvas): 5.1, 5.2, 6.5, 6.6, 6.7, 6.8   -->
<!-- - DC chapter 7 reading quiz on Canvas     -->

