Introductions

Examples of contemporary data

Taxicabs and the Shared Economy

A team of mathematicians and engineers has calculated that if taxi riders were willing to share a cab, New York City could reduce the current fleet of 13,500 taxis up to 40 percent. Link to news story and an interactive site with the data.

  • Traditional style: take a sample of taxi passengers. Ask them where they got on and where they got off the taxi.
  • New style:
    • Every taxi transaction is registered, including GPS coordinates of flag up and flag down
    • A taxi-hailing app can register all calls for service, even those that aren’t successful.

Medicare Spending

Newspaper article here

Data available here.

The logic of this course

A very large part of the use and presentation of data draws on a small set of concepts and techniques. These are not difficult individually and can be taught individually as simple manoeuvres. In this way, they are simple, like Legos©. The complexity of data use and presentation comes from combining these concepts and techniques in various ways to achieve our specific purposes just as an elaborate model can be built out of simple blocks.

The individual lego bricks are simple.1 A city made by arranging lego bricks 2
Bricks Trafalgar Legoland

We’re going to start with some infrastructure for these techniques:

In coming weeks, we will study

Orientation to Class Resources

source("http://dtkaplan.github.io/DCF-Course-2014/Notes/Week-1/install_packages.R")

Examples from many fields

Some data sets we will access for examples.

BabyNames                Names of children as recorded by the US Social
                            Security Administration.
CountryCentroids         Geographic location of countries
CountryData              Many variables on countries from the 2014 CIA factbook.
CountryGroups            Membership in Country Groups
DirectRecoveryGroups     
MedicareCharges          
MedicareProviders        
MigrationFlows           Human Migration between Countries
Minneapolis2013          Ballots in the 2013 Mayoral election in Minneapolis
NCI60                    Gene expression in cancer.
NCI60cells               Cell Line descriptions in the NCI-60 dataset
WorldCities              Cities and their populations
ZipDemography            Demographic information for most US ZIP Codes (Postal Codes)
ZipGeography             Geographic information by US Zip Codes (Postal Codes)
registeredVoters         A sample of the voter registration list for Wake County, 
                            North Carolina in Fall 2010.

teaching | stat 184 home | syllabus | piazza | canvas


  1. Source : “Lego Color Bricks” by Alan Chia - Lego Color Bricks. Licensed under CC BY-SA 2.0 via Wikimedia Commons

  2. Source: Trafalgar Legoland 2003 by Kaihsu Tai - Kaihsu Tai. Licensed under CC BY-SA 3.0 via Wikimedia Commons