Small Group Discussion:
- What was the muddiest point from the chapters this week (R command patterns; Files & Documents)?
- opinions about “function application” syntax and the “chaining syntax”?
- function application:
object.name <- function.name(argument, named.arg = value)
- chaining syntax uses
%>%
(curiously known as a “pipe”) to link several steps together
R Command Patterns
Command chains
Princes <-
BabyNames %>%
filter(grepl("Prince",name)) %>%
group_by(year) %>%
summarise(total = sum(count))
Your commands will be written as chains.
Parts of Speech in R
- Data tables
- A data table comprises one or more variables.
- Convention: data tables are given names that start with a CAPITAL LETTER, e.g.,
RegisteredVoters
.
- A data table will always be the input at the start of a command chain.
- If assignment is used to save the result, the object created is usually a data table.
- Functions
- Functions are objects that transform an input into an output.
- Functions are always followed by parentheses, that is, an opening
(
and, eventually, a closing )
.
- Each link in a command chain starts with a function.
- More specifically, the function is a data verb that takes a data table as input and produces another data table as output.
- There are other kinds of functions, e.g. summary (or reduction) functions and transformation functions.
- Arguments
- Variables
- Variables are the components of data tables.
- When they are used, they always appear in function arguments, that is, between the function’s parentheses.
- A good convention is for variables to have names that start with a lower-case letter. The convention is not universally followed.
- Variables will never be followed by
(
.
- Constants
- Constants are single values, most commonly a number or a character string.
- Character strings will always be in quotation marks,
"like this."
- Numerals are the written form of numbers, for instance.
-42
1984
3.14159
- Assignment
- saves the output of the command (chain) in a named object.
- This is done with the assignment operator,
<-
- Formulas
- mostly left to future statistics classes
Conventions of your book
The book has defined a convention that data tables should begin with a capital letter & variables should begin with a lowercase letter. It’s important to note that these conventions are for the benefit of users & consumers of your code. R will not enforce them for you!
Other popular conventions:
- use descriptive but concise object names (harder than it sounds, but totally worth it)
- camel-case or “.” syntax (i.e,
variableName
or variable.name
)
- be generous with whitespace (R just ignores it, and it makes code MUCH easier for humans to read)
- Use the
#
character to include comments within code chunks (again be generous; R ignores comments)
- limit length of R commands to about 80 characters
There are several published style guides to help R programmers write beautiful code.
Discussion Problem
Consider this command chain:
Princes <-
BabyNames %>%
filter(grepl("Prince",name)) %>%
group_by(year) %>%
summarise(total = sum(count))
Just from the syntax, you should be able to tell which of the five different kinds of object each of these things is: Princes
, BabyNames
, filter
, grepl
, "Prince"
, name
, group_by
, year
, summarise
, total
, sum
, count
.
R Markdown
- Markdown / RMarkdown
- Opening an Rmd file for editing.
- Saving Rmd files
- Compiling Rmd to HTML (or PDF or MS Word)
- Handing in files for class
- Upload HTML files to Canvas using DCF template in RMarkdown unless otherwise instructed.
- Template includes a few boilerplate commands to embed your Rmd code in the HTML document.
Creating an Rmd File
Use the “DCF Work” or “DataComputing simple” template file for Rmd:
- In RStudio: File >> New File >> R Markdown >> From Template >> DCF Work
- Eventually, you will upload your HTML file to Canvas (with embedded
.Rmd
file)
- stat184Template.html
The good people at RStudio have developed a number of “Cheat Sheets” to get people off and running with these tools. Here’s a link to several of them, including RMarkdown, RStudio, and other topics we’ll hit in this course.
In-Class Assignment:
Create an narrative description of at least 3 classes you are taking this term using RMarkdown. Include:
- Level 3 heading for each class
- Two sentances about why you’re taking the class
- links to the Canvas/Angel site
- links to a relevant Wikipedia (or other) article
- embed a relevant figure (perhaps from Wikipedia)
- embed .Rmd source file (i.e. use template)
Use the Rmd template above (i.e. adapt stat184Template.Rmd
or start fresh with the “DCF Work” or “DataComputing simple” template). Feel free to work together and help each other, but each student should submit their own work as an html file with embedded .Rmd on Canvas.
Note: narrative should be written with text connecting each portion, don’t just dump all the required elements into a document together.
Help each other, divide and conquor, share .Rmd code, post tips/questions/answers to Piazza!
Homework:
- See course webpage
- Make sure you submit the activity & complete peer reviews
- just submit the HTML file with your embedded .Rmd (i.e. use a template)
- DC Ch 3 & 4 Exercises (on paper):
- 3.1, 3.2, 3.3, 3.4, 3.5, 3.6
- 4.1, 4.2, 4.3
- DC chapter 5 & 6 reading quiz on Canvas
teaching | stat 184 home | syllabus | piazza | canvas