Data Computing
November 9, 2016
# Date the script was last refreshed
today()
## [1] "2016-11-09"
# Data Set Review
mdy("11-13-2016") - ymd(today())
## Time difference of 4 days
# Sumbit Draft for Peer Review
mdy("11-25-2016") - ymd(today())
## Time difference of 16 days
# Total Remaining
mdy("12-09-2016") - ymd(today())
## Time difference of 30 days
The offer…
Any Takers?
filter()
& grepl()
)mutate()
& gsub()
)tidyr::extract()
)Date
columnpage <- "https://en.wikipedia.org/wiki/Mile_run_world_record_progression"
XPATH <- '//*[@id="mw-content-text"]/table'
table_list <- page %>%
read_html() %>%
html_nodes(xpath = XPATH) %>%
html_table(fill = TRUE)
IAAFmen <- table_list[[4]]
head(IAAFmen, 3)
## Time Auto Athlete Nationality Date
## 1 4:14.4 John Paul Jones United States 31 May 1913[5]
## 2 4:12.6 Norman Taber United States 16 July 1915[5]
## 3 4:10.4 Paavo Nurmi Finland 23 August 1923[5]
## Venue
## 1 Allston, Mass.
## 2 Allston, Mass.
## 3 Stockholm
Now we can use mutate()
& gsub()
to help us clean up the footnotes from Date
:
IAAFmen %>%
mutate(Date = gsub("\\[.\\]$", "", Date)) %>%
head(3)
## Time Auto Athlete Nationality Date Venue
## 1 4:14.4 John Paul Jones United States 31 May 1913 Allston, Mass.
## 2 4:12.6 Norman Taber United States 16 July 1915 Allston, Mass.
## 3 4:10.4 Paavo Nurmi Finland 23 August 1923 Stockholm
The assignment is worth a total of 10 points.
ggplot
to construct a bar chart in descending order of popularity for the street name identifiers you found.Two data sets are provided. One includes 15,000 street addresses of registered voters in Wake County, North Carolina. The other includes over 900,000 street addresses of Medicare Service Providers. You can use either data set (or both!) for the activity.
Note: There’s nothing to do in the “For the professional…” section at the very end except to be impressed.