Base R - Zhaohu (Jonathan) Fan
November 30, 2016
Upon completing this part, we should be able to do the following:
c()
, :
, seq()
, rep()
.for
loop, while
loop, if
statements, function
writing.c( )
grade <- c(2, 5, 7, 5)
grade
## [1] 2 5 7 5
# calculate the mean of `grade`
mean(grade)
## [1] 4.75
grade <- c(2, 5, NA, 5)
grade
## [1] 2 5 NA 5
mean(grade)
## [1] NA
# we need argument `na.rm = TRUE
mean(grade, na.rm = TRUE)
## [1] 4
:
2:6
## [1] 2 3 4 5 6
seq()
seq(2, 3, by = 0.5)
## [1] 2.0 2.5 3.0
rep()
rep(c("A", 1, 2), times = 3)
## [1] "A" "1" "2" "A" "1" "2" "A" "1" "2"
# create a vector
x <- c("A", "B", "C", 1, 2, 3)
x
## [1] "A" "B" "C" "1" "2" "3"
Select only element 4
x[4]
## [1] "1"
Select elements 2 thru 4
x[2:4]
## [1] "B" "C" "1"
Select elements 1 and 5
x[c(1, 5)]
## [1] "A" "2"
# create a vector
x <- c("A", "B", "C", 1, 2, 3)
x
## [1] "A" "B" "C" "1" "2" "3"
Select all elements except element 4
x[-4]
## [1] "A" "B" "C" "2" "3"
Select all elements except elements 2 thru 4
x[-(2:4)]
## [1] "A" "2" "3"
# create a vector of numbers
y <- c(1, -1, 6, -2, 6, 5, 9, 1, 10, 10)
y
## [1] 1 -1 6 -2 6 5 9 1 10 10
Select elements equal to 10
y[y == 10]
## [1] 10 10
Select elements less than zero
y[y < 0]
## [1] -1 -2
Select elements in the set {1, 2, 5}
y[y %in% c(1, 2, 5)]
## [1] 1 5 1
for
LoopsBasic for
Loop Syntax:
for (variable in sequence){
do something
}
Here’s a for
loop that will “do something” for each number from 1 to 4:
for (i in 1:4){
j <- i + 10 # add 10 to the "current" value of `i` in each step of the loop
print(j) # print the result of `j` for each step of the loop
}
## [1] 11
## [1] 12
## [1] 13
## [1] 14
This is often used to iterate over each value in a vector where x[i] refers to element i
of the vector x
byFives <- seq(10, 25, by = 5) # vector counts by 5
byFives
## [1] 10 15 20 25
for (i in 1:length(byFives)){ # `length(byFives)` is the number of elements in the `byFives` vector
j <- byFives[i] / 5 # divide element `i` of the `fives` vector by 5 in each step of the loop
print(j) # print the result of `j` for each step of the loop
}
## [1] 2
## [1] 3
## [1] 4
## [1] 5
while
LoopsBasic while
Loop Syntax:
while (true condition){
do something
}
Here’s a while
loop that prints the index i
while it is less than 5:
## [1] 1
## [1] 2
## [1] 3
## [1] 4
Basic function
Syntax:
function_name <- function(var){
do something
return(new_variable)
}
Here’s a function that takes one number as an argument and squares it:
squared <- function(x){ # the function will be called "squared" and takes one argument
calculation <- x*x # this shows what the function will do with the `x` argument
return(calculation) # this tells the function to display the result of `calculation`
}
# Nothing happens until you call the new `squared( )` function that you've created:
squared(3)
## [1] 9
Basic if
Statement Syntax:
if (condition){
do something
} else {
do something different
}
i <- 5 # some variable set to 5
if (i > 3){ # if condition
print('Yes') # do this if condition is satisfied
} else {
print('No') # do this if condition is NOT satisfied
}
## [1] "Yes"
These if
statements work well with other programming tools:
- Inside a loop you might test a condition with each iteration through the loop and provide alternate instructions depending on the outcome.
- As part of a function
you might provide different instructions depending on the argument supplied.
A special case of a list where all elements are the same length.
letters <- c("A", "B", "C", "D", "E", "F") # create a vector
numbers <- 1:6 # create another vector
df <- data.frame(numbers, letters) # combine them as a data frame
df
## numbers letters
## 1 1 A
## 2 2 B
## 3 3 C
## 4 4 D
## 5 5 E
## 6 6 F
You can use square brackets to select specific elements based on [row, column]…
Select element in row 3 and column 2
df[3, 2]
## [1] C
## Levels: A B C D E F
Select all elements in row 4 (note comma placement)
df[4, ]
## numbers letters
## 4 4 D
Select all elements in column 2 (note comma placement again)
df[ , 2]
## [1] A B C D E F
## Levels: A B C D E F
$
operator for data framesdf <- data.frame(numbers, letters) # same data frame as before
df
## numbers letters
## 1 1 A
## 2 2 B
## 3 3 C
## 4 4 D
## 5 5 E
## 6 6 F
Dollar sign $
used to select a variable in a data frame by name
df$letters
## [1] A B C D E F
## Levels: A B C D E F
Dollar sign $
used to add a variable in a data frame by name
df$trueFalse <- c(T, F, T, T, F, F)
df
## numbers letters trueFalse
## 1 1 A TRUE
## 2 2 B FALSE
## 3 3 C TRUE
## 4 4 D TRUE
## 5 5 E FALSE
## 6 6 F FALSE
$
operator for other objectsConsider a simple linear regression model (mtcars
data set):
model <- lm(mtcars$mpg ~ mtcars$wt) # this is a model predicting miles per gallon from weight of the car
The summary( )
command will provide a useful summary of the model information
summary(model)
##
## Call:
## lm(formula = mtcars$mpg ~ mtcars$wt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## mtcars$wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
The model
object includes lots more though, as we can see through an str()
command
## List of 12
## $ coefficients : Named num [1:2] 37.29 -5.34
## ..- attr(*, "names")= chr [1:2] "(Intercept)" "mtcars$wt"
## $ residuals : Named num [1:32] -2.28 -0.92 -2.09 1.3 -0.2 ...
## ..- attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...
## $ effects : Named num [1:32] -113.65 -29.116 -1.661 1.631 0.111 ...
## ..- attr(*, "names")= chr [1:32] "(Intercept)" "mtcars$wt" "" "" ...
## $ rank : int 2
## $ fitted.values: Named num [1:32] 23.3 21.9 24.9 20.1 18.9 ...
## ..- attr(*, "names")= chr [1:32] "1" "2" "3" "4" ...
## $ assign : int [1:2] 0 1
## $ qr :List of 5
## ..$ qr : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:32] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:2] "(Intercept)" "mtcars$wt"
## .. ..- attr(*, "assign")= int [1:2] 0 1
## ..$ qraux: num [1:2] 1.18 1.05
## ..$ pivot: int [1:2] 1 2
## ..$ tol : num 1e-07
## ..$ rank : int 2
## ..- attr(*, "class")= chr "qr"
## $ df.residual : int 30
## $ xlevels : Named list()
## $ call : language lm(formula = mtcars$mpg ~ mtcars$wt)
## $ terms :Classes 'terms', 'formula' language mtcars$mpg ~ mtcars$wt
## .. ..- attr(*, "variables")= language list(mtcars$mpg, mtcars$wt)
## .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "mtcars$mpg" "mtcars$wt"
## .. .. .. ..$ : chr "mtcars$wt"
## .. ..- attr(*, "term.labels")= chr "mtcars$wt"
## .. ..- attr(*, "order")= int 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(mtcars$mpg, mtcars$wt)
## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. ..- attr(*, "names")= chr [1:2] "mtcars$mpg" "mtcars$wt"
## $ model :'data.frame': 32 obs. of 2 variables:
## ..$ mtcars$mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## ..$ mtcars$wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
## ..- attr(*, "terms")=Classes 'terms', 'formula' language mtcars$mpg ~ mtcars$wt
## .. .. ..- attr(*, "variables")= language list(mtcars$mpg, mtcars$wt)
## .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : chr [1:2] "mtcars$mpg" "mtcars$wt"
## .. .. .. .. ..$ : chr "mtcars$wt"
## .. .. ..- attr(*, "term.labels")= chr "mtcars$wt"
## .. .. ..- attr(*, "order")= int 1
## .. .. ..- attr(*, "intercept")= int 1
## .. .. ..- attr(*, "response")= int 1
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. .. ..- attr(*, "predvars")= language list(mtcars$mpg, mtcars$wt)
## .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. .. ..- attr(*, "names")= chr [1:2] "mtcars$mpg" "mtcars$wt"
## - attr(*, "class")= chr "lm"
We can extract that information from model
using the dollar sign $
operator.
How about coefficient estimates:
model$coefficients
## (Intercept) mtcars$wt
## 37.285126 -5.344472
We often show residuals vs fitted values to evaluate a model
fit. The model
object stores that information for your use, so you can extract it with the $
operator to easily retrieve it:
require(ggplot2)
## Loading required package: ggplot2
ggplot() +
geom_point(aes(x = model$fitted.values, y = model$residuals))
For the sake of illustration, suppose you want to color the residuals by the number of cylanders as a proxy for engine size. (note it would usually make more sense to include cylander as a variable in the model, but we’re just showing a way that you can pull information from compatible data sets together on a plot)
require(ggplot2)
ggplot() +
geom_point(aes(x = model$fitted.values, y = model$residuals, color = as.factor(mtcars$cyl)))
The assignment is worth a total of 10 points.
c( )
.names
3 times using rep( )
.cubed
with one argument, x
, that calculates x*x*x
and returns the result.y <- c(1:3)
, show that your function works for cubed(y)
if-else
inside your loop)iris
data set in R, build a regression model reg <- lm(iris$Petal.Length ~ iris$Petal.Width)
and then show a plot of residuals vs fitted values with points colored by iris$Species