Command Patterns in R
Week 3
Command chains
Your commands will be written as chains.
An example command chain
Princes <-
BabyNames %>%
filter(grepl("Prince",name)) %>%
group_by(year) %>%
summarise(total = sum(count))
- A good idea to put each link on its own line
- Note that
%>%
is at the end of each line.
- Except …
Princes <-
is assignment
- Except … The last line has no
%>%
.
Syntax and semantics
There are two distinct aspects involved in reading or writing a command chain.
- Syntax: the grammar of the command
- Semantics: the meaning of the command
The focus today is on syntax.
Parts of Speech
From the dictionarty
part of speech noun
plural noun: parts of speech a category to which a word is assigned in accordance with its syntactic functions. In English the main parts of speech are noun, pronoun, adjective, determiner, verb, adverb, preposition, conjunction, and interjection.
Parts of Speech in R
- Data tables
- Functions
- Arguments
- Variables
- Constants
- Assignment
- Formulas (mostly left to future statistics classes)
Data tables
- A data table comprises one or more variables.
- Naming convention: data tables are given names that start with a CAPITAL LETTER, e.g.,
RegisteredVoters
.
- A data table will always be the input at the start of a command chain.
- If assignment is used to save the result, the object created is usually a data table.
Functions
- Functions are objects that transform an input into an output.
- Functions are always followed by parentheses, that is, an opening
(
and, eventually, a closing )
.
- Each link in a command chain starts with a function.
- More specifically, the function is a data verb that takes a data table as input and produces another data table as output.
- There are other kinds of functions, e.g. summary (or reduction) functions and transformation functions.
Arguments
The things that go inside a function’s parentheses are called arguments.
- Arguments describe the details of what a function is to do.
- If there are multiple arguments, they are always separated by commas.
- Many functions take named arguments which look like a name followed by an
=
sign, e.g.
summarise(total = sum(count))
You can also consider the data table passed along by %>%
as an argument to a function that immediately follows.
Variables
Variables are the components of data tables.
- When they are used, they always appear in function arguments, that is, between the function’s parentheses.
- A good convention is for variables to have names that start with a lower-case letter. The convention is not universally followed.
- Variables will never be followed by
(
.
Constants
Constants are single values, most commonly a number or a character string.
- Character strings will always be in quotation marks,
"like this."
- Numerals are the written form of numbers, for instance.
-42
1984
3.14159
Discussion Problem
Consider this command chain:
Princes <-
BabyNames %>%
filter(grepl("Prince",name)) %>%
group_by(year) %>%
summarise(total = sum(count))
Just from the syntax, you should be able to tell which of the five different kinds of object each of these things is: Princes
, BabyNames
, filter
, grepl
, "Prince"
, name
, group_by
, year
, summarise
, total
, sum
, count
.
Explain your reasoning.
R Markdown
R Markdown
- Markdown / RMarkdown
- Opening an Rmd file for editing.
- Saving Rmd files
- Compiling Rmd to HTML (or PDF or MS Word)
- Handing in files for class
- Upload HTML files to Canvas using DCF template in RMarkdown unless otherwise instructed.
- Template includes a few boilerplate commands to embed your Rmd code in the HTML document.
Creating an Rmd File
Use the “DCF Work” or “DataComputing simple” template file for Rmd:
- In RStudio: File >> New File >> R Markdown >> From Template >> DCF Work
- Eventually, you will upload your HTML file to Canvas (with embedded
.Rmd
file)
- stat184Template.html
The good people at RStudio have developed a number of “Cheat Sheets” to get people off and running with these tools. Here’s a link to several of them, including RMarkdown, RStudio, and other topics we’ll hit in this course.
In-Class Assignment:
Create an narrative description of at least 3 classes you are taking this term using RMarkdown. Include:
- Level 3 heading for each class
- Two sentances about why you’re taking the class
- links to the Canvas/Angel site
- links to a relevant Wikipedia (or other) article
- embed a relevant figure (perhaps from Wikipedia)
- embed .Rmd source file (i.e. use template)
Use the Rmd template above (i.e. adapt stat184Template.Rmd
or start fresh with the “DCF Work” or “DataComputing simple” template). Feel free to work together and help each other, but each student should submit their own work as an html file with embedded .Rmd on Canvas.
Note: narrative should be written with text connecting each portion, don’t just dump all the required elements into a document together.
Help each other, divide and conquor, share .Rmd code, post tips/questions/answers to Piazza!
Homework:
- See course webpage
- Make sure you submit the activity & complete peer reviews
- just submit the HTML file with your embedded .Rmd (i.e. use a template)
- DC Ch 3 & 4 Exercises (on paper):
- 3.1, 3.2, 3.3, 3.4, 3.5, 3.6
- 4.1, 4.2, 4.3
- DC chapter 5 & 6 reading quiz on Canvas