Introduction to Python

MACS 30500
University of Chicago

October 24, 2016

Python

Why you should learn both languages

  • Things R does well
    • Statistical analysis
    • Data visualization
  • Things R does not do as well
    • Speed
  • Things Python does well
  • Things Python does not do as well
    • Visualizations
    • Add-on libraries
  • Computational social science at UChicago

An example combining Python and R for data analysis

Python steps

import urllib
import base64
import json
import http.client as httplib
import pandas as pd
import numpy as np
import requests

# you have to sign up for an API key, which has some allowances.
# Check the API documentation for further details:
_url = 'https://api.projectoxford.ai/emotion/v1.0/recognizeInVideo'
_key = 'Your Key Here' #Here you have to paste your primary key
_maxNumRetries = 10

Python steps

# URL direction: I hosted this on my domain
urlVideo = 'http://datacandy.co.uk/blog2.mp4'

# Computer Vision parameters
paramsPost = { 'outputStyle' : 'perFrame'}

headersPost = dict()
headersPost['Ocp-Apim-Subscription-Key'] = _key
headersPost['Content-Type'] = 'application/json'

jsonPost = { 'url': urlVideo }

responsePost = requests.request( 'post', _url, json = jsonPost, data = None,\
                                 headers = headersPost, params = paramsPost )

if responsePost.status_code == 202: # everything went well!
    videoIDLocation = responsePost.headers['Operation-Location']
    print(videoIDLocation)

Python steps

## Wait a bit, it's processing
headersGet = dict()
headersGet['Ocp-Apim-Subscription-Key'] = _key

jsonGet = {}
paramsGet = urllib.parse.urlencode({})
getResponse = requests.request( 'get', videoIDLocation, json = jsonGet,\
                                data = None, headers = headersGet, params = paramsGet )

rawData = json.loads(json.loads(getResponse.text)['processingResult'])
timeScale = rawData['timescale']
frameRate = rawData['framerate']

Python steps

emotionPerFramePerFace = {}
currFrameNum = 0

for currFragment in rawData['fragments']:
    for currEvent in currFragment['events']:
        emotionPerFramePerFace[currFrameNum] = currEvent
        currFrameNum += 1

Python steps

# Data collection
person1, person2  = [], []
for frame_no, v in emotionPerFramePerFace.copy().items():
    for i, minidict in enumerate(v):
        for k, v in minidict['scores'].items():
            minidict[k] = v

        minidict['frame'] = frame_no
        if i == 0:
            person1.append(minidict)
        else:
            person2.append(minidict)

df1 = pd.DataFrame(person1)
df2 = pd.DataFrame(person2)
del df1['scores']
del df2['scores']

# Saving in pd data-frame format:
df1.to_csv("data/trump.csv", index=False)
df2.to_csv("data/clinton.csv", index=False)

Analyze the data in R

library(tidyverse)

# Trump's face
trump <- read_csv("data/trump.csv")
trump_g <- trump %>%
  gather(key, value, c(anger, contempt, disgust, fear, happiness,  neutral, sadness, surprise)) %>%
  filter(!key == "neutral") %>%
  filter(id == 0) %>%
  mutate(candidate = "Trump")

# Clinton's face
clinton <- read_csv("data/clinton.csv")
clinton_g <- clinton %>%
  gather(key, value, c(anger, contempt, disgust, fear, happiness,  neutral, sadness, surprise)) %>%
  filter(!key == "neutral") %>%
  filter(id == 1) %>%
  mutate(candidate = "Clinton")

# Merge them
all <- bind_rows(clinton_g, trump_g)
all
## # A tibble: 128,597 × 9
##    frame   height    id    width        x        y   key       value
##    <int>    <dbl> <int>    <dbl>    <dbl>    <dbl> <chr>       <dbl>
## 1      0 0.246296     1 0.138542 0.717708 0.296296 anger 3.27982e-05
## 2      1 0.248148     1 0.139583 0.714583 0.283333 anger 4.68744e-05
## 3      2 0.248148     1 0.139583 0.712500 0.275926 anger 1.10304e-04
## 4      3 0.248148     1 0.139583 0.711458 0.274074 anger 6.68550e-05
## 5      4 0.251852     1 0.141667 0.708333 0.264815 anger 9.19732e-05
## 6      5 0.251852     1 0.141667 0.705208 0.259259 anger 1.49960e-04
## 7      6 0.253704     1 0.142708 0.703125 0.255556 anger 6.03095e-05
## 8      7 0.253704     1 0.142708 0.700000 0.250000 anger 3.52558e-05
## 9      8 0.253704     1 0.142708 0.697917 0.244444 anger 3.22375e-05
## 10     9 0.251852     1 0.141667 0.695833 0.240741 anger 7.48344e-05
## # ... with 128,587 more rows, and 1 more variables: candidate <chr>

Analyze the data in R

# Smooth line chart
ggplot(all, aes(frame, value, group = key, col = key)) +
  geom_smooth(method = "loess", n = 100000, se = F,  span = 0.1) +
  facet_wrap(~ candidate, ncol = 1) +
  theme_minimal(base_size = 20)

Algorithm accuracy

Boolean

print(1 == 1)
print(4 == 5)
## True
## False

Numbers

print(1 + 1)
print(4 * 2.5)
print(6 / 3)
## 2
## 10.0
## 2

Strings

message = 'This is a string of text'
print(message)
print(type(message))
## This is a string of text
## <type 'str'>
'4' - '2'
## Traceback (most recent call last):
##   File "<string>", line 1, in <module>
## TypeError: unsupported operand type(s) for -: 'str' and 'str'

Lists

[10, 20, 30, 40]
['runaway bunny', 'goodnight moon', 'my world']
[2, 'bunny', 'zebra', False]
[2, [2, 4, 6], 'gorilla']

To select specific elements from lists, use subsetting:

x = [1,2,3]
y = [2,x]
z = [x,y]

print(z)
print(z[1])
print(z[1][0])
## [[1, 2, 3], [2, [1, 2, 3]]]
## [2, [1, 2, 3]]
## 2

Dictionaries

x = {'Karen':12, 'Julian':10}
y = {'hello':'world',
 'goodbye':'papaya'}
z = {'a':[1,2,3], 'b':[3,2,1]}

Dictionaries

x = {'Karen':12, 'Julian':10}
y = {'hello':'world',
 'goodbye':'papaya'}
z = {'a':[1,2,3], 'b':[3,2,1]}

print(x['Julian'])
print(z['b'][1:2])
## 10
## [2]

Functions vs. methods

  • A function is a piece of code that is called by name
    • Has a definable input and output
    • Explicit passing of input
  • A method is a piece of code that is called by a name that is associated with an object
    • Implicitly passed data from the object on which it was called

Example of a function

print("Happy Birthday to you!")
## Happy Birthday to you!

Example of a method

x = [1, 2, 3]
print(x)

x.reverse()
print(x)
## [1, 2, 3]
## [3, 2, 1]

Conditional execution

import random
p = random.random()
responsible = random.choice([0,1])

if p < .05:
  print('we have a publishable finding')
elif responsible == 0:
  print("it's so close, let's round down")
else:
  print('back to square one')
## it's so close, let's round down

Iteration

x = range(10)
print(x)

for i in x:
  j = i**2
  print(j)
## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
## 0
## 1
## 4
## 9
## 16
## 25
## 36
## 49
## 64
## 81

Write a fizzbuzz function in R

fizzbuzz <- function(x){
  if(x %% 3 == 0 && x %% 5 == 0){
    return("fizzbuzz")
  } else if(x %% 3 == 0){
    return("fizz")
  } else if(x %% 5 == 0){
    return("buzz")
  } else{
    return(x)
  }
}
fizzbuzz(3)
## [1] "fizz"
fizzbuzz(5)
## [1] "buzz"
fizzbuzz(15)
## [1] "fizzbuzz"
fizzbuzz(4)
## [1] 4

Write a fizzbuzz function in Python

def fizzbuzz(x):
  if x % 3 == 0 and x % 5 == 0:
    print("fizzbuzz")
  elif x % 3 == 0:
    print("fizz")
  elif x % 5 == 0:
    print("buzz")
  else:
    print(x)

fizzbuzz(3)
fizzbuzz(5)
fizzbuzz(15)
fizzbuzz(4)
## fizz
## buzz
## fizzbuzz
## 4

Check to see if someone can see this movie

Write a conditional expression in Python that uses the variable age to determine if a person can see a movie based on its MPAA rating.

  • G and PG - any age
  • PG-13 - must be greater than 13
  • R - must be greater than 17

Movie age check in R

age <- 15

if(age > 17){
  print("can see a rated R movie")
} else if(age < 17 && age > 12){
  print("can see a rated PG-13 movie")
} else {
  print("can only see rated G and PG movies")
}
## [1] "can see a rated PG-13 movie"

Movie age check in Python

age = 15

if age > 17: 
  print("can see a rated R movie")
elif age < 17 and age > 12:
  print("can see a rated PG-13 movie")
else: 
  print("can only see rated G and PG movies")
## can see a rated PG-13 movie

Movie age check in Python as a function

def age_movie(age):
  if age > 17: 
    print("can see a rated R movie")
  elif age < 17 and age > 12:
    print("can see a rated PG-13 movie")
  else: 
    print("can only see rated G and PG movies")

age_movie(15)
## can see a rated PG-13 movie

Iterate over a list of ages in Python

def age_movie(age):
  if age > 17: 
    print("can see a rated R movie")
  elif age < 17 and age > 12:
    print("can see a rated PG-13 movie")
  else: 
    print("can only see rated G and PG movies")

ages = [15, 11, 21]

for person in ages:
  age_movie(person)
## can see a rated PG-13 movie
## can only see rated G and PG movies
## can see a rated R movie