A data table is comprises cases and variables.
Each variable comprises values (or levels).
There is no hard distinction between a variable and a value. What’s a variable in one situation may be a value in another, and vice versa.
A data table
Students
who | x | y | dorm |
---|---|---|---|
Alice | 7 | English | Sproul |
Lesley | 19 | Mandarin | Bigler |
Yu | 23 | French | Snyder |
who | x | y | dorm |
---|---|---|---|
Alice | 7 | English | Sproul |
Lesley | 19 | Mandarin | Bigler |
Yu | 23 | French | Snyder |
who | dorm | key | value |
---|---|---|---|
Alice | Sproul | x | 7 |
Lesley | Bigler | x | 19 |
Yu | Snyder | x | 23 |
Alice | Sproul | y | English |
Lesley | Bigler | y | Mandarin |
Yu | Snyder | y | French |
who | x | y | dorm |
---|---|---|---|
Alice | 7 | English | Sproul |
Lesley | 19 | Mandarin | Bigler |
Yu | 23 | French | Snyder |
Syntax:
WideInput %>%
gather(key_name, value_name, ...)
The ...
are the variables to be gathered together, e.g.
StudentsNarrow <- Students %>% gather(key, value, x, y)
who | dorm | key | value |
---|---|---|---|
Alice | Sproul | x | 7 |
Lesley | Bigler | x | 19 |
Yu | Snyder | x | 23 |
Alice | Sproul | y | English |
Lesley | Bigler | y | Mandarin |
Yu | Snyder | y | French |
Aside from Key and Value, all the other variables identify the case.
The gathering makes multiple rows for each row in the wide form. The variables not used for narrowing are copied into the new multiple cases.
Syntax:
NarrowInput %>% spread(key, value)
Process:
StudentsNarrow %>% spread(key, value)
who | dorm | x | y |
---|---|---|---|
Alice | Sproul | 7 | English |
Lesley | Bigler | 19 | Mandarin |
Yu | Snyder | 23 | French |
Some operations are easy in wide format, but hard in narrow and vice versa
BabyNames
name | sex | count | year |
---|---|---|---|
Alice | F | 593 | 1998 |
Alice | F | 650 | 1999 |
Lesley | F | 695 | 1998 |
Lesley | M | 15 | 1998 |
Lesley | F | 682 | 1999 |
Lesley | M | 21 | 1999 |
Yu | F | 8 | 1998 |
Yu | M | 9 | 1998 |
Yu | F | 11 | 1999 |
Yu | M | 10 | 1999 |
Questions:
BabyTotals <-
BabyNames %>%
group_by(name, sex) %>%
summarise(total = sum(count))
name | sex | total |
---|---|---|
Alice | F | 546020 |
Alice | M | 1926 |
Lesley | F | 33604 |
Lesley | M | 4784 |
Yu | F | 354 |
Yu | M | 340 |
Easy!
sex
is the key variabletotal
is the value variableBabyTotalsWide <- BabyTotals %>%
spread(sex, total)
BabyTotalsWide
name | F | M |
---|---|---|
Alice | 546020 | 1926 |
Lesley | 33604 | 4784 |
Yu | 354 | 340 |
BabyTotalsWide %>%
mutate(gender_specificity = pmax(F/(M+F), M/(M+F)))
name | F | M | gender_specificity |
---|---|---|---|
Alice | 546020 | 1926 | 0.9964851 |
Lesley | 33604 | 4784 | 0.8753777 |
Yu | 354 | 340 | 0.5100865 |
Note: SKIP Dividends section beginning on p. 173
Assignment is worth a total of 10 points.
ggplot
to chart stock price over time for a few stocks (stocks chosen may vary)Prices
table and Actions
tableSalesDifference
tablePrices
table and Reference
tableggplot
to chart index over time for chosen stocks (stocks chosen may vary)teaching | stat 184 home | syllabus | piazza | canvas