Small Group Discussion:

Three Important Concepts

  1. As we’ve been discussing, Data can be usefully organized into tables with “cases” and “variables.”
    1. In “tidy data,” every case is the same sort of thing, e.g. a person, a car, a year, a country in a year.
    2. We sometimes even modify data in order to change what the cases represent in order to better represent a point.
  2. Data graphics can be constructed easily when each case corresponds to a “glyph” (mark) on the graph, and each variable to a graphical attribute of that glyph such as x- or y-position, color, size, length, shape, etc. Such data is called “glyph-ready.” (The same is true for more technical presentations of data, e.g., models, predictions, etc. — once the data are set up with appropriate cases and variables, the presentation is straightforward.)

  3. When data are not yet in glyph-ready form, you can transform them into glyph-ready form.
    1. Such transformations are accomplished by performing one or more of a small set of basic operations on data tables
    2. This is the work of data “verbs”

Today’s Agenda

Learning about the raw data

Let’s use the following commands to learn about our data:

Here are three (unrelated) data sets:

  1. What is the setting for the data? That is, what are they about?
  2. How many cases are there?
  3. How many variables are there? What are their names?
  4. Pick out three of the variables and say whether
    • the variable is quantitative or categorical
    • if categorical, what are some levels of the variable
    • if quantitative, what are the units of measurement of the variable.
  5. Describe, in everyday terms, what kind of thing cases represent in each of the data tables.

Why we wrangle

Consider the Minneapolis 2013 election data. Here’s a bar chart that might be used to show the election results:

This graph reflects the following data table (only part of which is shown):

## # A tibble: 6 x 2
##                First votes
##                <chr> <int>
## 1       BETSY HODGES 28935
## 2        MARK ANDREW 19584
## 3        DON SAMUELS  8335
## 4         CAM WINTON  7511
## 5 JACKIE CHERRYHOMES  3524
## 6           BOB FINE  2094

Compare the Minneapolis2013 data table and the data table printed above.

  1. Do they have the same number of cases?
  2. Do the cases in the two tables represent the same sort of thing?
  3. Do the two tables have any variable(s) in common?
  4. Speculate on how the two tables are related to one another.

Activity: Data verbs for summarizing and grouping (HELPrct)

Instructions:
  • complete the three tasks below as a group
  • submit an HTML file with embedded .Rmd (i.e. use class template) to “Activity: Data Verbs (HELPrct)” on Canvas
  • Submit one for each group by Friday at 11:59pm
Set Up:
# The HELPrct data are available in the mosaicData package
library(mosaicData)

# Load the HELPrct data set into our RStudio environment
data("HELPrct")

# Also, use View(HELPrct) in the console to open a tab in RStudio and see the data set
Task 1:

summarise() : Find an expression involving summarize() and HELPrct that will produce the following.

  • number of people (cases) in HELPrct study
  • total number of times in the past 6 months entered a detox program (measured at baseline) for all the people in HELPrct (silly)
  • mean time (in days) to first use of any substance post-detox for all the people in HELPrct
Task 2:

group_by() : repeat task 1 above, but calculate the results group-by-group and write a sentence or two about what you observe in the results for each of the following:

  • males versus females
  • homeless or not
  • substance
  • break down the homeless versus housed further, by sex
  • break down the homeless versus housed further, by substance
Task 3:

Include one or more interesting plots of the data involving at least 3 variables per plot. Write a few sentances to explain the story that your plot tells about these data. You can use one of the relationships that you studied in Task 2, or you can explore a different group of variables in the HELPrct that show something interesting.

Homework


teaching | stat 184 home | syllabus | piazza | canvas