Week 15 Class Notes

Announcements

101 Althouse Focus Groups
- today @ 1:30; tomorrow @ 12:15
- free pizza & drinks
Final Projects
- Due Friday 12/9 @ NOON
- Anyone want to share their progress/ideas for some feedback?
Project competitions (see message in Canvas inbox)
Base R (Jonathan)

Learning objectives & outcomes

Upon completing this part, we should be able to do the following:

Create vectors using c(), :, seq(), rep().
Select vector elements using the square brackets.
Program using for loop, while loop, if statements, function writing.
Data frames to select and square brackets to subset rows and/or columns.

Creating Vectors

create a vector using c( )

grade <- c(2, 5, 7, 5)
grade

## [1] 2 5 7 5

# calculate the mean of `grade`
mean(grade)

## [1] 4.75

If calculation involves a missing value, then the results will also be missing by default:

grade <- c(2, 5, NA, 5)
grade

## [1]  2  5 NA  5

mean(grade)

## [1] NA

# we need argument `na.rm = TRUE
mean(grade, na.rm = TRUE)

## [1] 4

Create Vectors using :

2:6

## [1] 2 3 4 5 6

Create Vectors using seq()

seq(2, 3, by = 0.5)

## [1] 2.0 2.5 3.0

Create Vectors using rep()

rep(c("A", 1, 2), times = 3)

## [1] "A" "1" "2" "A" "1" "2" "A" "1" "2"

Selecting Vector Elements with square brackets

# create a vector
x <- c("A", "B", "C", 1, 2, 3)
x

## [1] "A" "B" "C" "1" "2" "3"

Select only element 4

x[4]

## [1] "1"

Select elements 2 thru 4

x[2:4]

## [1] "B" "C" "1"

Select elements 1 and 5

x[c(1, 5)]

## [1] "A" "2"

Excluding Vector Elements with square brackets

# create a vector
x <- c("A", "B", "C", 1, 2, 3)
x

## [1] "A" "B" "C" "1" "2" "3"

Select all elements except element 4

x[-4]

## [1] "A" "B" "C" "2" "3"

Select all elements except elements 2 thru 4

x[-(2:4)]

## [1] "A" "2" "3"

Selecting Vector Elements that match a Condition

# create a vector of numbers
y <- c(1, -1,  6, -2,  6,  5,  9,  1, 10, 10)
y

##  [1]  1 -1  6 -2  6  5  9  1 10 10

Select elements equal to 10

y[y == 10]

## [1] 10 10

Select elements less than zero

y[y < 0]

## [1] -1 -2

Select elements in the set {1, 2, 5}

y[y %in% c(1, 2, 5)]

## [1] 1 5 1

Programming with `for` Loops

Basic for Loop Syntax:

for (variable in sequence){  
  do something  
}

Here’s a for loop that will “do something” for each number from 1 to 4:

for (i in 1:4){
  j <- i + 10     # add 10 to the "current" value of `i` in each step of the loop
  print(j)        # print the result of `j` for each step of the loop
}

## [1] 11
## [1] 12
## [1] 13
## [1] 14

This is often used to iterate over each value in a vector where x[i] refers to element i of the vector x

byFives <- seq(10, 25, by = 5)   # vector counts by 5
byFives

## [1] 10 15 20 25

for (i in 1:length(byFives)){   # `length(byFives)` is the number of elements in the `byFives` vector 
  j <- byFives[i] / 5           # divide element `i` of the `fives` vector by 5 in each step of the loop
  print(j)                      # print the result of `j` for each step of the loop
}

## [1] 2
## [1] 3
## [1] 4
## [1] 5

Programming with `while` Loops

Basic while Loop Syntax:

while (true condition){
  do something
}

Here’s a while loop that prints the index i while it is less than 5:

i <- 1            # initialize our index variable

while (i < 5){    # set the condition of the `while` loop
  print(i)        # print the value of `i` in each step of the loop
  i <- i + 1      # increment `i` for the next trip through the loop
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4

Programming with functions (that you build yourself)

Basic function Syntax:

function_name <- function(var){
  do something
  return(new_variable)
}

Here’s a function that takes one number as an argument and squares it:

squared <- function(x){    # the function will be called "squared" and takes one argument
  calculation <- x*x           # this shows what the function will do with the `x` argument
  return(calculation)          # this tells the function to display the result of `calculation`
}

# Nothing happens until you call the new `squared( )` function that you've created:
squared(3)

## [1] 9

Programming with conditional flow (if-else)

Basic if Statement Syntax:

if (condition){
  do something
} else {
  do something different
}

i <- 5           # some variable set to 5

if (i > 3){      # if condition
  print('Yes')   # do this if condition is satisfied
} else {
  print('No')    # do this if condition is NOT satisfied
}

## [1] "Yes"

These if statements work well with other programming tools:
- Inside a loop you might test a condition with each iteration through the loop and provide alternate instructions depending on the outcome.
- As part of a function you might provide different instructions depending on the argument supplied.

Data Frame operations

A special case of a list where all elements are the same length.

letters <- c("A", "B", "C", "D", "E", "F")    # create a vector
numbers <- 1:6                                # create another vector

df <- data.frame(numbers, letters)          # combine them as a data frame
df

numbers	letters
1	A
2	B
3	C
4	D
5	E
6	F

You can use square brackets to select specific elements based on [row, column]…

Select element in row 3 and column 2

df[3, 2]

## [1] C
## Levels: A B C D E F

Select all elements in row 4 (note comma placement)

df[4, ]

	numbers	letters
4	4	D

Select all elements in column 2 (note comma placement again)

df[ , 2]

## [1] A B C D E F
## Levels: A B C D E F

The dollar sign `$` operator for data frames

df <- data.frame(numbers, letters)    # same data frame as before
df

numbers	letters
1	A
2	B
3	C
4	D
5	E
6	F

Dollar sign $ used to select a variable in a data frame by name

df$letters

## [1] A B C D E F
## Levels: A B C D E F

Dollar sign $ used to add a variable in a data frame by name

df$trueFalse <- c(T, F, T, T, F, F)
df

numbers	letters	trueFalse
1	A	TRUE
2	B	FALSE
3	C	TRUE
4	D	TRUE
5	E	FALSE
6	F	FALSE

The dollar sign `$` operator for other objects

Consider a simple linear regression model (mtcars data set):

model <- lm(mtcars$mpg ~ mtcars$wt)   # this is a model predicting miles per gallon from weight of the car

The summary( ) command will provide a useful summary of the model information

summary(model)

## 
## Call:
## lm(formula = mtcars$mpg ~ mtcars$wt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## mtcars$wt    -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

The model object includes lots more though, as we can see through an str() command

str(model)

## List of 12
##  $ coefficients : Named num [1:2] 37.29 -5.34
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "mtcars$wt"
##  $ residuals    : Named num [1:32] -2.28 -0.92 -2.09 1.3 -0.2 ...
##   ..- attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:32] -113.65 -29.116 -1.661 1.631 0.111 ...
##   ..- attr(*, "names")= chr [1:32] "(Intercept)" "mtcars$wt" "" "" ...
##  $ rank         : int 2
##  $ fitted.values: Named num [1:32] 23.3 21.9 24.9 20.1 18.9 ...
##   ..- attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...
##  $ assign       : int [1:2] 0 1
##  $ qr           :List of 5
##   ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:32] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:2] "(Intercept)" "mtcars$wt"
##   .. ..- attr(*, "assign")= int [1:2] 0 1
##   ..$ qraux: num [1:2] 1.18 1.05
##   ..$ pivot: int [1:2] 1 2
##   ..$ tol  : num 1e-07
##   ..$ rank : int 2
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 30
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = mtcars$mpg ~ mtcars$wt)
##  $ terms        :Classes 'terms', 'formula'  language mtcars$mpg ~ mtcars$wt
##   .. ..- attr(*, "variables")= language list(mtcars$mpg, mtcars$wt)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "mtcars$mpg" "mtcars$wt"
##   .. .. .. ..$ : chr "mtcars$wt"
##   .. ..- attr(*, "term.labels")= chr "mtcars$wt"
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(mtcars$mpg, mtcars$wt)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "mtcars$mpg" "mtcars$wt"
##  $ model        :'data.frame':   32 obs. of  2 variables:
##   ..$ mtcars$mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##   ..$ mtcars$wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language mtcars$mpg ~ mtcars$wt
##   .. .. ..- attr(*, "variables")= language list(mtcars$mpg, mtcars$wt)
##   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:2] "mtcars$mpg" "mtcars$wt"
##   .. .. .. .. ..$ : chr "mtcars$wt"
##   .. .. ..- attr(*, "term.labels")= chr "mtcars$wt"
##   .. .. ..- attr(*, "order")= int 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(mtcars$mpg, mtcars$wt)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:2] "mtcars$mpg" "mtcars$wt"
##  - attr(*, "class")= chr "lm"

We can extract that information from model using the dollar sign $ operator.

How about coefficient estimates:

model$coefficients

## (Intercept)   mtcars$wt 
##   37.285126   -5.344472

We often show residuals vs fitted values to evaluate a model fit. The model object stores that information for your use, so you can extract it with the $ operator to easily retrieve it:

require(ggplot2)
ggplot() + 
  geom_point(aes(x = model$fitted.values, y = model$residuals))

For the sake of illustration, suppose you want to color the residuals by the number of cylanders as a proxy for engine size. (note it would usually make more sense to include cylander as a variable in the model, but we’re just showing a way that you can pull information from compatible data sets together on a plot)

require(ggplot2)
ggplot() + 
  geom_point(aes(x = model$fitted.values, y = model$residuals, color = as.factor(mtcars$cyl)))

Homework

Activity: Practice with Base R
Work on Final Projects

Activity: Practice with Base R

Grading

The assignment is worth a total of 10 points.

[1 point] Turn in HTML with embedded .Rmd file (e.g. “DataComputing simple” template)
[1 point] Create a vector called “names” with the names of each person in your group using c( ).
[1 point] Create a vector called “nameRepeat” that repeats names 3 times using rep( ).
[1 point] Use square brackets to select the 6th element of “nameRepeat” and print the result.
[1 point] Write a function called cubed with one argument, x, that calculates x*x*x and returns the result.
[1 point] Using y <- c(1:3), show that your function works for cubed(y)
[2 points] Write a loop that prints each number from 1:10, except prints “LUCKY NUMBER SEVEN!!!” rather than the number “7” (hint: use if-else inside your loop)
[2 points] Using the iris data set in R, build a regression model reg <- lm(iris$Petal.Length ~ iris$Petal.Width) and then show a plot of residuals vs fitted values with points colored by iris$Species

teaching | stat 184 home | syllabus | piazza | canvas