Small Group Discussion:


R Command Patterns

Command chains

Princes <- 
  BabyNames %>%
  filter(grepl("Prince",name)) %>%
  group_by(year) %>%
  summarise(total = sum(count))

Your commands will be written as chains.

  • Each link in the chain will be a data verb and its arguments.
    • The very first link is usually a data table.
  • Links are connected by the chaining symbol %>%

  • Often, but not always, you will save the output of the chain in a named object.
    • This is done with the assignment operator, <-
  • A good idea to put each link on its own line
  • Note that %>% is at the end of each line.
    • Except … Princes <- is assignment
    • Except … The last line has no %>%.

Parts of Speech in R

  1. Data tables
    • A data table comprises one or more variables.
    • Convention: data tables are given names that start with a CAPITAL LETTER, e.g., RegisteredVoters.
    • A data table will always be the input at the start of a command chain.
    • If assignment is used to save the result, the object created is usually a data table.
  2. Functions
    • Functions are objects that transform an input into an output.
    • Functions are always followed by parentheses, that is, an opening ( and, eventually, a closing ).
    • Each link in a command chain starts with a function.
      • More specifically, the function is a data verb that takes a data table as input and produces another data table as output.
      • There are other kinds of functions, e.g. summary (or reduction) functions and transformation functions.
  3. Arguments
    • The things that go inside a function’s parentheses are called arguments.
    • Arguments describe the details of what a function is to do.
    • If there are multiple arguments, they are always separated by commas.
    • Many functions take named arguments which look like a name followed by an = sign, e.g.
    • You can also consider the data table passed along by %>% as an argument to a function that immediately follows.

      summarise(total = sum(count))
  4. Variables
    • Variables are the components of data tables.
    • When they are used, they always appear in function arguments, that is, between the function’s parentheses.
    • A good convention is for variables to have names that start with a lower-case letter. The convention is not universally followed.
    • Variables will never be followed by (.
  5. Constants
    • Constants are single values, most commonly a number or a character string.
    • Character strings will always be in quotation marks,
      "like this."
    • Numerals are the written form of numbers, for instance.
      -42
      1984
      3.14159
  6. Assignment
    • saves the output of the command (chain) in a named object.
    • This is done with the assignment operator, <-
  7. Formulas
    • mostly left to future statistics classes

Conventions of your book

The book has defined a convention that data tables should begin with a capital letter & variables should begin with a lowercase letter. It’s important to note that these conventions are for the benefit of users & consumers of your code. R will not enforce them for you!

Discussion Problem

Consider this command chain:

Princes <- 
  BabyNames %>%
  filter(grepl("Prince",name)) %>%
  group_by(year) %>%
  summarise(total = sum(count))

Just from the syntax, you should be able to tell which of the five different kinds of object each of these things is: Princes, BabyNames, filter, grepl, "Prince", name, group_by, year, summarise, total, sum, count.


R Markdown

Creating an Rmd File

Use the “DCF Work” or “DataComputing simple” template file for Rmd:

  • In RStudio: File >> New File >> R Markdown >> From Template >> DCF Work
  • Eventually, you will upload your HTML file to Canvas (with embedded .Rmd file)
  • stat184Template.html

The good people at RStudio have developed a number of “Cheat Sheets” to get people off and running with these tools. Here’s a link to several of them, including RMarkdown, RStudio, and other topics we’ll hit in this course.

In-Class Assignment:

Create an narrative description of at least 3 classes you are taking this term using RMarkdown. Include:

  • Level 3 heading for each class
  • Two sentances about why you’re taking the class
  • links to the Canvas/Angel site
  • links to a relevant Wikipedia (or other) article
  • embed a relevant figure (perhaps from Wikipedia)
  • embed .Rmd source file (i.e. use template)

Use the Rmd template above (i.e. adapt stat184Template.Rmd or start fresh with the “DCF Work” or “DataComputing simple” template). Feel free to work together and help each other, but each student should submit their own work as an html file with embedded .Rmd on Canvas.

Note: narrative should be written with text connecting each portion, don’t just dump all the required elements into a document together.

Help each other, divide and conquor, share .Rmd code, post tips/questions/answers to Piazza!


Homework:


teaching | stat 184 home | syllabus | piazza | canvas