Introduction to Computing for the Social Sciences

MACS 30500
University of Chicago

September 26, 2016

Course site

https://uc-cfss.github.io

Major topics

Elementary programming techniques (e.g. loops, conditional statements, functions)
Writing reusable, interpretable code
Debugging
Obtaining, importing, and tidying data from a variety of sources
Performing statistical analysis
Visualizing information
Creating interactive reports
Generating reproducible research

print("Hello world!")

## [1] "Hello world!"

# linear model
lm(hwy ~ displ, data = mpg) %>%
  tidy %>%
  mutate(term = c("Intercept", "Engine displacement (in liters)")) %>%
  knitr::kable(digits = 2,
               col.names = c("Variable", "Estimate", "Standard Error",
                             "T-statistic", "P-Value"))

Variable	Estimate	Standard Error	T-statistic	P-Value
Intercept	35.70	0.72	49.55	0
Engine displacement (in liters)	-3.53	0.19	-18.15	0

# visualization
ggplot(data = mpg, aes(displ, hwy)) + 
  geom_point(aes(color = class)) +
  geom_smooth(method = "lm", se = FALSE, color = "black", alpha = .25) +
  labs(x = "Engine displacement (in liters)",
       y = "Highway miles per gallon",
       color = "Car type") +
  theme_bw(base_size = 16)

15 min rule: when stuck, you HAVE to try on your own for 15 min; after 15 min, you HAVE to ask for help.- Brain AMA pic.twitter.com/MS7FnjXoGH
— Rachel Thomas (@math_rachel) August 14, 2016

Other resources

Google
StackOverflow
Me
TA
Fellow students
Class discussion page

Plagiarism

Collaboration is good – to a point
Learning from others/the internet

Plagiarism

If you don’t understand what the program is doing and are not prepared to explain it in detail, you should not submit it.

Evaluations

Weekly programming assignments (70%)
Final project (30%)

Program

A series of instructions that specifies how to perform a computation

Input
Output
Math
Conditional execution
Repetition

Write a report analyzing the relationship between ice cream consumption and crime rates in Chicago

Jane: a GUI workflow

Sally: a programatic workflow

Reproducibility

Are my results valid? Can it be replicated?
The idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them
Also allows the researcher to precisely replicate his/her analysis

Version control

Revisions in research
Tracking revisions
- analysis-1.r
- analysis-2.r
- analysis-3.r
- Cloud storage (e.g. Dropbox, Google Drive, Box)
Version control software
Repository

Documentation

Comments
- Comments are the what
- Code is the how
Computer code should also be self-documenting
Future-proofing

Badly documented code

library(twitteR)
source("keys.R")
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
data <- userTimeline("realdonaldtrump", n = 1000)
data2 <- twListToDF(data)
write.csv(data2, "data2.csv")

Good code

# get_tweets.R
# Program to get Donald Trump tweets using Twitter API

# access Twitter API functions
library(twitteR)

# setup API authentication
source("keys.R")    # store keys privately in separate file

setup_twitter_oauth(consumer_key,
                    consumer_secret,
                    access_token,
                    access_secret)

# get 1000 most recent tweets
username <- "realdonaldtrump"
tweets <- userTimeline(username, n = 1000)

# convert to data frame
tweets_df <- twListToDF(tweets)

# write to disk
write.csv(tweets_df, "tweets_trump.csv")

Introduction to Computing for the Social Sciences

MACS 30500 University of Chicago

September 26, 2016

Course site

Major topics

Other resources

Plagiarism

Plagiarism

Evaluations

Program

Jane: a GUI workflow

Sally: a programatic workflow

Reproducibility

Version control

Documentation

Badly documented code

Good code

MACS 30500
University of Chicago