Week 10 Class Notes

Announcements

FREE STUFF!!!
- Books are here
- DataCamp Access to Premium Content for 6 months
Final Project Update
- Canvas drop boxes
- Optional class feedback
  - 1-2 per week
  - first come first served
  - 5 minutes to talk about project (progress, problems, etc)
  - 5 minutes feedback from class & instructors
- Optional peer reviews
- DataCamp Alternative
  - Credit awarded for completion of assigned courses listed on DataCamp.com
  - 1 course: 10% credit
  - 2 courses: 20% credit
  - 3 courses: 30% credit
  - 4 courses: 50% credit
  - 5 courses: 70% credit
  - 6 courses: 80% credit
  - 7 courses: 85% credit
  - Note: Introduction to R course does NOT count toward Final Project alternative

Chapters 12 & 15 Fly-by

Chapter 12

rank() is pretty useful
row_number() is too

Chapter 15

There are a ton of ways to get data into R
- CSV (comma-separated-values) is a really common format
  - several functions read CSV files into R, but they are not all created equal… (DC likes read.file())
  - file.choose() It’s interactive so you can’t use it in .Rmd, but it’s a great way to get file paths
- Lots of software export to CSV, but R can handle lots of proprietary formats too (see foreign package)
- R can query relational databases like MS Access, Oracle, SAP, mySQL, etc (see rodbc package and others)
- Scraping the web (we’ll come back to that)
A word about some object classes…
- data.frame
- list
- variable types and coercion
  - factors
  - numeric
  - character
  - POSIXct (see lubridate package)

Scraping Mile Run Records from Wikipedia

Steps

Locate webpage
Identify data table(s) to scrape
Right click on the table you want, choose “Inspect Element”
Roll cursor over the HTML code (even if you don’t understand it) until you see the whole table that you want appear highlighted. Click on the row that highlights the whole table.
Right click the highlighted row >> Copy >> XPath
Edit the R code chunk below to paste the XPath with SINGLE quotes around it, and URL with quotes around it as shown.
Execute the code chunk!

library("rvest")

page_url <- "https://en.wikipedia.org/wiki/Mile_run_world_record_progression"
XPATH <- '//*[@id="mw-content-text"]/table'

table_list <- 
  page_url %>%
  read_html() %>%
  html_nodes(xpath = XPATH) %>%
  html_table(fill = TRUE)

XPATH help: Instructions on getting the xpath to an element on a web page (in Chrome).

Scraping Mile Run Records from Wikipedia

Let’s say we want to scrape Mile Run Records from Wikipedia…

Here’s the page: https://en.wikipedia.org/wiki/Mile_run_world_record_progression

Scraping Mile Run Records from Wikipedia

Using our handy template, we replace the page_url & XPATH

page <- "https://en.wikipedia.org/wiki/Mile_run_world_record_progression"
XPATH <- '//*[@id="mw-content-text"]/table'

table_list <- page %>%
  read_html() %>%
  html_nodes(xpath = XPATH) %>%
  html_table(fill = TRUE)

str(table_list)

## List of 6
##  $ :'data.frame':    11 obs. of  5 variables:
##   ..$ Time       : chr [1:11] "4:28" "4:28" "4:23" "4:22¼" ...
##   ..$ Athlete    : chr [1:11] "Charles Westhall" "Thomas Horspool" "Thomas Horspool" "Siah Albison" ...
##   ..$ Nationality: chr [1:11] " United Kingdom" " United Kingdom" " United Kingdom" " United Kingdom" ...
##   ..$ Date       : chr [1:11] "26 July 1855" "28 September 1857" "12 July 1858" "27 October 1860" ...
##   ..$ Venue      : chr [1:11] "London" "Manchester" "Manchester" "Manchester" ...
##  $ :'data.frame':    16 obs. of  5 variables:
##   ..$ Time       : chr [1:16] "4:55" "4:49" "4:46" "4:33" ...
##   ..$ Athlete    : chr [1:16] "J. Heaviside" "J. Heaviside" "Matthew Greene" "George Farran" ...
##   ..$ Nationality: chr [1:16] " United Kingdom" " United Kingdom" " United Kingdom" " United Kingdom" ...
##   ..$ Date       : chr [1:16] "1 April 1861" "27 May 1861" "27 May 1861" "23 May 1862" ...
##   ..$ Venue      : chr [1:16] "Dublin" "Dublin" "Dublin" "Dublin" ...
##  $ :'data.frame':    5 obs. of  5 variables:
##   ..$ Time       : chr [1:5] "4:52" "4:45" "4:45" "4:40" ...
##   ..$ Athlete    : chr [1:5] "Cadet Marshall" "Thomas Finch" "St. Vincent Hammick" "Gerald Surman" ...
##   ..$ Nationality: chr [1:5] " United Kingdom" " United Kingdom" " United Kingdom" " United Kingdom" ...
##   ..$ Date       : chr [1:5] "2 September 1852" "3 November 1858" "15 November 1858" "24 November 1859" ...
##   ..$ Venue      : chr [1:5] "Addiscome" "Oxford" "Oxford" "Oxford" ...
##  $ :'data.frame':    32 obs. of  6 variables:
##   ..$ Time       : chr [1:32] "4:14.4" "4:12.6" "4:10.4" "4:09.2" ...
##   ..$ Auto       : chr [1:32] "" "" "" "" ...
##   ..$ Athlete    : chr [1:32] "John Paul Jones" "Norman Taber" "Paavo Nurmi" "Jules Ladoumègue" ...
##   ..$ Nationality: chr [1:32] " United States" " United States" " Finland" " France" ...
##   ..$ Date       : chr [1:32] "31 May 1913[5]" "16 July 1915[5]" "23 August 1923[5]" "4 October 1931[5]" ...
##   ..$ Venue      : chr [1:32] "Allston, Mass." "Allston, Mass." "Stockholm" "Paris" ...
##  $ :'data.frame':    18 obs. of  5 variables:
##   ..$ Time       : chr [1:18] "6:13.2" "5:27.5" "5:24.0" "5:23.0" ...
##   ..$ Athlete    : chr [1:18] "Elizabeth Atkinson" "Ruth Christmas" "Gladys Lunn" "Gladys Lunn" ...
##   ..$ Nationality: chr [1:18] " United Kingdom" " United Kingdom" " United Kingdom" " United Kingdom" ...
##   ..$ Date       : chr [1:18] "24 June 1921" "20 August 1932" "1 June 1936" "18 July 1936" ...
##   ..$ Venue      : chr [1:18] "Manchester" "London" "Brentwood" "London" ...
##  $ :'data.frame':    13 obs. of  6 variables:
##   ..$ Time       : chr [1:13] "4:37.0" "4:36.8" "4:35.3" "4:29.5" ...
##   ..$ Auto       : chr [1:13] "" "" "" "" ...
##   ..$ Athlete    : chr [1:13] "Anne Smith" "Maria Gommers" "Ellen Tittel" "Paola Pigni" ...
##   ..$ Nationality: chr [1:13] " United Kingdom" " Netherlands" " West Germany" " Italy" ...
##   ..$ Date       : chr [1:13] "3 June 1967[6]" "14 June 1969[6]" "20 August 1971[6]" "8 August 1973[6]" ...
##   ..$ Venue      : chr [1:13] "London" "Leicester" "Sittard" "Viareggio" ...

Scraping Mile Run Records from Wikipedia

# Look at the structure (look for how many tables are in the list; verify they are "data.frame" format)
str(table_list)

## List of 6
##  $ :'data.frame':    11 obs. of  5 variables:
##   ..$ Time       : chr [1:11] "4:28" "4:28" "4:23" "4:22¼" ...
##   ..$ Athlete    : chr [1:11] "Charles Westhall" "Thomas Horspool" "Thomas Horspool" "Siah Albison" ...
##   ..$ Nationality: chr [1:11] " United Kingdom" " United Kingdom" " United Kingdom" " United Kingdom" ...
##   ..$ Date       : chr [1:11] "26 July 1855" "28 September 1857" "12 July 1858" "27 October 1860" ...
##   ..$ Venue      : chr [1:11] "London" "Manchester" "Manchester" "Manchester" ...
##  $ :'data.frame':    16 obs. of  5 variables:
##   ..$ Time       : chr [1:16] "4:55" "4:49" "4:46" "4:33" ...
##   ..$ Athlete    : chr [1:16] "J. Heaviside" "J. Heaviside" "Matthew Greene" "George Farran" ...
##   ..$ Nationality: chr [1:16] " United Kingdom" " United Kingdom" " United Kingdom" " United Kingdom" ...
##   ..$ Date       : chr [1:16] "1 April 1861" "27 May 1861" "27 May 1861" "23 May 1862" ...
##   ..$ Venue      : chr [1:16] "Dublin" "Dublin" "Dublin" "Dublin" ...
##  $ :'data.frame':    5 obs. of  5 variables:
##   ..$ Time       : chr [1:5] "4:52" "4:45" "4:45" "4:40" ...
##   ..$ Athlete    : chr [1:5] "Cadet Marshall" "Thomas Finch" "St. Vincent Hammick" "Gerald Surman" ...
##   ..$ Nationality: chr [1:5] " United Kingdom" " United Kingdom" " United Kingdom" " United Kingdom" ...
##   ..$ Date       : chr [1:5] "2 September 1852" "3 November 1858" "15 November 1858" "24 November 1859" ...
##   ..$ Venue      : chr [1:5] "Addiscome" "Oxford" "Oxford" "Oxford" ...
##  $ :'data.frame':    32 obs. of  6 variables:
##   ..$ Time       : chr [1:32] "4:14.4" "4:12.6" "4:10.4" "4:09.2" ...
##   ..$ Auto       : chr [1:32] "" "" "" "" ...
##   ..$ Athlete    : chr [1:32] "John Paul Jones" "Norman Taber" "Paavo Nurmi" "Jules Ladoumègue" ...
##   ..$ Nationality: chr [1:32] " United States" " United States" " Finland" " France" ...
##   ..$ Date       : chr [1:32] "31 May 1913[5]" "16 July 1915[5]" "23 August 1923[5]" "4 October 1931[5]" ...
##   ..$ Venue      : chr [1:32] "Allston, Mass." "Allston, Mass." "Stockholm" "Paris" ...
##  $ :'data.frame':    18 obs. of  5 variables:
##   ..$ Time       : chr [1:18] "6:13.2" "5:27.5" "5:24.0" "5:23.0" ...
##   ..$ Athlete    : chr [1:18] "Elizabeth Atkinson" "Ruth Christmas" "Gladys Lunn" "Gladys Lunn" ...
##   ..$ Nationality: chr [1:18] " United Kingdom" " United Kingdom" " United Kingdom" " United Kingdom" ...
##   ..$ Date       : chr [1:18] "24 June 1921" "20 August 1932" "1 June 1936" "18 July 1936" ...
##   ..$ Venue      : chr [1:18] "Manchester" "London" "Brentwood" "London" ...
##  $ :'data.frame':    13 obs. of  6 variables:
##   ..$ Time       : chr [1:13] "4:37.0" "4:36.8" "4:35.3" "4:29.5" ...
##   ..$ Auto       : chr [1:13] "" "" "" "" ...
##   ..$ Athlete    : chr [1:13] "Anne Smith" "Maria Gommers" "Ellen Tittel" "Paola Pigni" ...
##   ..$ Nationality: chr [1:13] " United Kingdom" " Netherlands" " West Germany" " Italy" ...
##   ..$ Date       : chr [1:13] "3 June 1967[6]" "14 June 1969[6]" "20 August 1971[6]" "8 August 1973[6]" ...
##   ..$ Venue      : chr [1:13] "London" "Leicester" "Sittard" "Viareggio" ...

# Inspect the first table in the list (IAAF Men from the Wikipedia Page)
IAAFtimes <- table_list[[4]]
tail(IAAFtimes)

	Time	Athlete	Nationality	Date	Venue
27	3:48.53	Sebastian Coe	United Kingdom	19 August 1981[5]	Zürich
28	3:48.40	Steve Ovett	United Kingdom	26 August 1981[5]	Koblenz
29	3:47.33	Sebastian Coe	United Kingdom	28 August 1981[5]	Brussels
30	3:46.32	Steve Cram	United Kingdom	27 July 1985[5]	Oslo
31	3:44.39	Noureddine Morceli	Algeria	5 September 1993[5]	Rieti
32	3:43.13	Hicham El Guerrouj	Morocco	7 July 1999[5]	Rome

source

## function (file, local = FALSE, echo = verbose, print.eval = echo, 
##     verbose = getOption("verbose"), prompt.echo = getOption("prompt"), 
##     max.deparse.length = 150, chdir = FALSE, encoding = getOption("encoding"), 
##     continue.echo = getOption("continue"), skip.echo = 0, keep.source = getOption("keep.source")) 
## {
##     envir <- if (isTRUE(local)) {
##         parent.frame()
##     }
##     else if (identical(local, FALSE)) {
##         .GlobalEnv
##     }
##     else if (is.environment(local)) {
##         local
##     }
##     else stop("'local' must be TRUE, FALSE or an environment")
##     have_encoding <- !missing(encoding) && encoding != "unknown"
##     if (!missing(echo)) {
##         if (!is.logical(echo)) 
##             stop("'echo' must be logical")
##         if (!echo && verbose) {
##             warning("'verbose' is TRUE, 'echo' not; ... coercing 'echo <- TRUE'")
##             echo <- TRUE
##         }
##     }
##     if (verbose) {
##         cat("'envir' chosen:")
##         print(envir)
##     }
##     ofile <- file
##     from_file <- FALSE
##     srcfile <- NULL
##     if (is.character(file)) {
##         if (identical(encoding, "unknown")) {
##             enc <- utils::localeToCharset()
##             encoding <- enc[length(enc)]
##         }
##         else enc <- encoding
##         if (length(enc) > 1L) {
##             encoding <- NA
##             owarn <- options(warn = 2)
##             for (e in enc) {
##                 if (is.na(e)) 
##                   next
##                 zz <- file(file, encoding = e)
##                 res <- tryCatch(readLines(zz, warn = FALSE), 
##                   error = identity)
##                 close(zz)
##                 if (!inherits(res, "error")) {
##                   encoding <- e
##                   break
##                 }
##             }
##             options(owarn)
##         }
##         if (is.na(encoding)) 
##             stop("unable to find a plausible encoding")
##         if (verbose) 
##             cat(gettextf("encoding = \"%s\" chosen", encoding), 
##                 "\n", sep = "")
##         if (file == "") {
##             file <- stdin()
##             srcfile <- "<stdin>"
##         }
##         else {
##             filename <- file
##             file <- file(filename, "r", encoding = encoding)
##             on.exit(close(file))
##             if (isTRUE(keep.source)) {
##                 lines <- readLines(file, warn = FALSE)
##                 on.exit()
##                 close(file)
##                 srcfile <- srcfilecopy(filename, lines, file.mtime(filename)[1], 
##                   isFile = TRUE)
##             }
##             else {
##                 from_file <- TRUE
##                 srcfile <- filename
##             }
##             loc <- utils::localeToCharset()[1L]
##             encoding <- if (have_encoding) 
##                 switch(loc, `UTF-8` = "UTF-8", `ISO8859-1` = "latin1", 
##                   "unknown")
##             else "unknown"
##         }
##     }
##     else {
##         lines <- readLines(file, warn = FALSE)
##         if (isTRUE(keep.source)) 
##             srcfile <- srcfilecopy(deparse(substitute(file)), 
##                 lines)
##         else srcfile <- deparse(substitute(file))
##     }
##     exprs <- if (!from_file) {
##         if (length(lines)) 
##             .Internal(parse(stdin(), n = -1, lines, "?", srcfile, 
##                 encoding))
##         else expression()
##     }
##     else .Internal(parse(file, n = -1, NULL, "?", srcfile, encoding))
##     on.exit()
##     if (from_file) 
##         close(file)
##     Ne <- length(exprs)
##     if (verbose) 
##         cat("--> parsed", Ne, "expressions; now eval(.)ing them:\n")
##     if (chdir) {
##         if (is.character(ofile)) {
##             if (grepl("^(ftp|http|file)://", ofile)) 
##                 warning("'chdir = TRUE' makes no sense for a URL")
##             else if ((path <- dirname(ofile)) != ".") {
##                 owd <- getwd()
##                 if (is.null(owd)) 
##                   stop("cannot 'chdir' as current directory is unknown")
##                 on.exit(setwd(owd), add = TRUE)
##                 setwd(path)
##             }
##         }
##         else {
##             warning("'chdir = TRUE' makes no sense for a connection")
##         }
##     }
##     if (echo) {
##         sd <- "\""
##         nos <- "[^\"]*"
##         oddsd <- paste0("^", nos, sd, "(", nos, sd, nos, sd, 
##             ")*", nos, "$")
##         trySrcLines <- function(srcfile, showfrom, showto) {
##             lines <- tryCatch(suppressWarnings(getSrcLines(srcfile, 
##                 showfrom, showto)), error = function(e) e)
##             if (inherits(lines, "error")) 
##                 character()
##             else lines
##         }
##     }
##     yy <- NULL
##     lastshown <- 0
##     srcrefs <- attr(exprs, "srcref")
##     for (i in seq_len(Ne + echo)) {
##         tail <- i > Ne
##         if (!tail) {
##             if (verbose) 
##                 cat("\n>>>> eval(expression_nr.", i, ")\n\t\t =================\n")
##             ei <- exprs[i]
##         }
##         if (echo) {
##             nd <- 0
##             srcref <- if (tail) 
##                 attr(exprs, "wholeSrcref")
##             else if (i <= length(srcrefs)) 
##                 srcrefs[[i]]
##             if (!is.null(srcref)) {
##                 if (i == 1) 
##                   lastshown <- min(skip.echo, srcref[3L] - 1)
##                 if (lastshown < srcref[3L]) {
##                   srcfile <- attr(srcref, "srcfile")
##                   dep <- trySrcLines(srcfile, lastshown + 1, 
##                     srcref[3L])
##                   if (length(dep)) {
##                     leading <- if (tail) 
##                       length(dep)
##                     else srcref[1L] - lastshown
##                     lastshown <- srcref[3L]
##                     while (length(dep) && grepl("^[[:blank:]]*$", 
##                       dep[1L])) {
##                       dep <- dep[-1L]
##                       leading <- leading - 1L
##                     }
##                     dep <- paste0(rep.int(c(prompt.echo, continue.echo), 
##                       c(leading, length(dep) - leading)), dep, 
##                       collapse = "\n")
##                     nd <- nchar(dep, "c")
##                   }
##                   else srcref <- NULL
##                 }
##             }
##             if (is.null(srcref)) {
##                 if (!tail) {
##                   dep <- substr(paste(deparse(ei, control = "showAttributes"), 
##                     collapse = "\n"), 12L, 1000000L)
##                   dep <- paste0(prompt.echo, gsub("\n", paste0("\n", 
##                     continue.echo), dep))
##                   nd <- nchar(dep, "c") - 1L
##                 }
##             }
##             if (nd) {
##                 do.trunc <- nd > max.deparse.length
##                 dep <- substr(dep, 1L, if (do.trunc) 
##                   max.deparse.length
##                 else nd)
##                 cat("\n", dep, if (do.trunc) 
##                   paste(if (grepl(sd, dep) && grepl(oddsd, dep)) 
##                     " ...\" ..."
##                   else " ....", "[TRUNCATED] "), "\n", sep = "")
##             }
##         }
##         if (!tail) {
##             yy <- withVisible(eval(ei, envir))
##             i.symbol <- mode(ei[[1L]]) == "name"
##             if (!i.symbol) {
##                 curr.fun <- ei[[1L]][[1L]]
##                 if (verbose) {
##                   cat("curr.fun:")
##                   utils::str(curr.fun)
##                 }
##             }
##             if (verbose >= 2) {
##                 cat(".... mode(ei[[1L]])=", mode(ei[[1L]]), "; paste(curr.fun)=")
##                 utils::str(paste(curr.fun))
##             }
##             if (print.eval && yy$visible) {
##                 if (isS4(yy$value)) 
##                   methods::show(yy$value)
##                 else print(yy$value)
##             }
##             if (verbose) 
##                 cat(" .. after ", sQuote(deparse(ei, control = c("showAttributes", 
##                   "useSource"))), "\n", sep = "")
##         }
##     }
##     invisible(yy)
## }
## <bytecode: 0x7f8c89773590>
## <environment: namespace:base>

Penn State Football Receiving Statistics

Google Penn State Football Statistics: http://bfy.tw/88gl
Identify a data table to scrape (for example, “receiving statistics”)
Right click on the table you want, choose “Inspect Element”
Roll cursor over the HTML code (even if you don’t understand it) until you see the whole table that you want appear highlighted. Click on the row that highlights the whole table.
Right click the highlighted row >> Copy >> XPath
Edit the R code chunk below to paste the XPath with SINGLE quotes around it, and URL with quotes around it as shown.
Execute the code chunk!

library("rvest")
url <- "http://www.espn.com/college-football/team/stats/_/id/213/penn-state-nittany-lions"
XPATH <- '//*[@id="my-players-table"]/div[4]/div/table'

Table1 <- url %>%
  html() %>%
  html_nodes(xpath='XPATH') %>%
  html_table()

Table1[[1]]

Penn State Football Receiving Statistics

url <- "http://www.espn.com/college-football/team/stats/_/id/213/penn-state-nittany-lions"
XPATH <- '//*[@id="my-players-table"]/div[4]/div/table'

Table2 <- url %>%
  read_html(header = TRUE) %>%
  html_nodes(xpath=XPATH) %>%
  html_table()

# R stores the result as a "list" object, so the double square brackets select the first 
#    element of the list, and we store it at a data table called FootballStatsRaw

FootballStatsRaw <- Table2[[1]]

# Inspect the Data Table
FootballStatsRaw

X1	X2	X3	X4	X5	X6
Receiving Statistics	Receiving Statistics	Receiving Statistics	Receiving Statistics	Receiving Statistics	Receiving Statistics
NAME	REC	YDS	AVG	LONG	TD
Chris Godwin	25	364	14.6	52 (TD)	3
DeAndre Thompkins	18	328	18.2	70 (TD)	1
Mike Gesicki	27	323	12.0	53	2
DaeSean Hamilton	19	238	12.5	45	1
Saquon Barkley	11	143	13.0	40 (TD)	1
Irvin Charles	1	80	80.0	80 (TD)	1
Saeed Blacknall	3	59	19.7	35	0
Juwan Johnson	1	27	27.0	27	0
Brandon Polk	2	18	9.0	14	0
Mark Allen	2	5	2.5	4	0
Miles Sanders	1	3	3.0	3	0
Andre Robinson	1	2	2.0	2	0
Totals	111	1590	14.3	80	9

# Tidy up the data table & rename variables
FootballStatsClean <- 
  FootballStatsRaw %>%  
  rename(name = X1, receptions = X2, total_yds = X3, avg_yds = X4, longest = X5, touchdowns = X6) %>%
  filter(row_number() > 2, name != "Totals")   
  
# Inspect FootballStatsClean
FootballStatsClean

name	receptions	total_yds	avg_yds	longest	touchdowns
Chris Godwin	25	364	14.6	52 (TD)	3
DeAndre Thompkins	18	328	18.2	70 (TD)	1
Mike Gesicki	27	323	12.0	53	2
DaeSean Hamilton	19	238	12.5	45	1
Saquon Barkley	11	143	13.0	40 (TD)	1
Irvin Charles	1	80	80.0	80 (TD)	1
Saeed Blacknall	3	59	19.7	35	0
Juwan Johnson	1	27	27.0	27	0
Brandon Polk	2	18	9.0	14	0
Mark Allen	2	5	2.5	4	0
Miles Sanders	1	3	3.0	3	0
Andre Robinson	1	2	2.0	2	0

Homework

DC Ch 12 Exercises: 12.1, 12.2
DC Ch 15 Exercises: 15.1 (skip Armistice Day), 15.2 (copy/paste this for d: “£100” )
DC chapter 14 reading quiz on Canvas
Activity: Scraping Nuclear Reactors
Choose final project data & submit on Canvas for approval (not graded; due 11/13 at 11:59pm)

Activity: Scraping Nuclear Reactors

Grading

Assignment is worth a total of 10 points.

[2 points] Turn in HTML with embedded .Rmd file (e.g. “DataComputing simple” template)
[1 points] Successfully scrape raw data for Japan Reactors from Wikipedia
[2 points] Your Turn #1: Tidy Data & Data Cleaning (Japan)
[1 points] Your Turn #2: Plot Net Generation Capacity vs Construction Date
[2 points] Your Turn #3: Scrape & Clean China Data (then Merge with Japan)
[2 points] Your Turn #5: Reconstruct Info Graphic of Japan Reactors (or other country of interest)

Assignment Sections & Hints:

Scrape Japan Data from Wikipedia
- Hint: if you’re having trouble with “subscript out of bounds”, just get the xpath for the Japan table directly, rather than reading all tables at once
Your Turn #1: Tidy Data & Data Cleaning
Your Turn #2: Plot Net Generation Capacity vs Construction Date
[SKIP] Your Turn #3: Scrape & Clean China Data (then Merge with Japan)
- Hint: use mutate() to create the new variable “country”
- Hint: use either cbind() or rbind() to combine the data tables together (be careful to match column order)
[SKIP] Your Turn #4: Scrape USA Reactors
Your Turn #5: Reconstruct Info Graphic of Japan Reactors
- Hint: use paste() function to combine the name and reactor variables to create an informative unique case ID

teaching | stat 184 home | syllabus | piazza | canvas