Data Computing Chapter 11
October 19, 2016
A data table is comprises cases and variables.
Each variable comprises values (or levels).
There is no hard distinction between a variable and a value. What’s a variable in one situation may be a value in another, and vice versa.
A data table
Students| who | x | y | dorm |
|---|---|---|---|
| Alice | 7 | English | Sproul |
| Lesley | 19 | Mandarin | Bigler |
| Yu | 23 | French | Snyder |
| who | dorm | key | value |
|---|---|---|---|
| Alice | Sproul | x | 7 |
| Lesley | Bigler | x | 19 |
| Yu | Snyder | x | 23 |
| Alice | Sproul | y | English |
| Lesley | Bigler | y | Mandarin |
| Yu | Snyder | y | French |
| who | x | y | dorm |
|---|---|---|---|
| Alice | 7 | English | Sproul |
| Lesley | 19 | Mandarin | Bigler |
| Yu | 23 | French | Snyder |
Syntax:
WideInput %>%
gather(key_name, value_name, ...)
The ... are the variables to be gathered together, e.g.
StudentsNarrow <- Students %>% gather(key, value, x, y)
StudentsNarrow| who | dorm | key | value |
|---|---|---|---|
| Alice | Sproul | x | 7 |
| Lesley | Bigler | x | 19 |
| Yu | Snyder | x | 23 |
| Alice | Sproul | y | English |
| Lesley | Bigler | y | Mandarin |
| Yu | Snyder | y | French |
Aside from Key and Value, all the other variables identify the case.
The gathering makes multiple rows for each row in the wide form. The variables not used for narrowing are copied into the new multiple cases.
Some operations are easy in wide format, but hard in narrow and vice versa
BabyNames| name | sex | count | year |
|---|---|---|---|
| Alice | F | 593 | 1998 |
| Alice | F | 650 | 1999 |
| Lesley | F | 695 | 1998 |
| Lesley | M | 15 | 1998 |
| Lesley | F | 682 | 1999 |
| Lesley | M | 21 | 1999 |
| Yu | F | 8 | 1998 |
| Yu | M | 9 | 1998 |
| Yu | F | 11 | 1999 |
| Yu | M | 10 | 1999 |
BabyTotals <-
BabyNames %>%
group_by(name, sex) %>%
summarise(total = sum(count))| name | sex | total |
|---|---|---|
| Alice | F | 546020 |
| Alice | M | 1926 |
| Lesley | F | 33604 |
| Lesley | M | 4784 |
| Yu | F | 354 |
| Yu | M | 340 |
Easy!
sex is the key variabletotal is the value variableBabyTotalsWide <- BabyTotals %>%
spread(sex, total)
BabyTotalsWide| name | F | M |
|---|---|---|
| Alice | 546020 | 1926 |
| Lesley | 33604 | 4784 |
| Yu | 354 | 340 |
BabyTotalsWide %>%
mutate(gender_specificity = pmax(F/(M+F), M/(M+F)))| name | F | M | gender_specificity |
|---|---|---|---|
| Alice | 546020 | 1926 | 0.9964851 |
| Lesley | 33604 | 4784 | 0.8753777 |
| Yu | 354 | 340 | 0.5100865 |
Note: SKIP “Dividends” section beginning on p. 173
Assignment is worth a total of 10 points.
ggplot to chart stock price over time for a few stocks (stocks chosen may vary)
Prices table and Actions table
SalesDifference table
Prices table and Reference table
ggplot to chart index over time for chosen stocks (stocks chosen may vary)