Getting data from the web: API access

MACS 30500
University of Chicago

October 31, 2016

How to get data online

  • Click-and-download
  • Install-and-play
  • API-query
  • Scraping

Click-and-Download

  • Use read.csv or readr::read_csv to read the data straight into R
  • Store, then read
    • Download using downloader for R or curl from the shell
    • Then read into R using read_csv
    • Even if the file disappears from the internet, you have a local copy cached

APIs

  • Application Programming Interfaces
  • Many websites use APIs to allow programatic access
    • Clients sends request to server
    • Server responds by sending data (or an error message)
  • Clients
    • Phone app
    • R/Python

Install and play packages

  • Pre-wrapped packages for APIs
  • Reproducible
  • Updating
  • Ease
  • Scaling

Sightings of birds: rebird

  • ebird - online database of bird sightings
  • rebird - R package to interface with e-Bird

Search birds by geography

Lincoln Park Zoo

Search birds by geography

library(rebird)
ebirdhotspot(locID = "L1573785") %>%
  tbl_df()
## # A tibble: 8 × 11
##         lng                   locName howMany               sciName
##       <dbl>                     <chr>   <int>                 <chr>
## 1 -87.63272 Lincoln Park Zoo, Chicago       1     Certhia americana
## 2 -87.63272 Lincoln Park Zoo, Chicago       2    Larus delawarensis
## 3 -87.63272 Lincoln Park Zoo, Chicago       1 Corvus brachyrhynchos
## 4 -87.63272 Lincoln Park Zoo, Chicago       1  Poecile atricapillus
## 5 -87.63272 Lincoln Park Zoo, Chicago       3    Turdus migratorius
## 6 -87.63272 Lincoln Park Zoo, Chicago       5      Sturnus vulgaris
## 7 -87.63272 Lincoln Park Zoo, Chicago       1        Junco hyemalis
## 8 -87.63272 Lincoln Park Zoo, Chicago      50     Passer domesticus
## # ... with 7 more variables: obsValid <lgl>, locationPrivate <lgl>,
## #   obsDt <chr>, obsReviewed <lgl>, comName <chr>, lat <dbl>, locID <chr>

Search birds by geographic area

chibirds <- ebirdgeo(lat = 41.8781, lng = -87.6298)
chibirds %>%
  tbl_df()
## # A tibble: 158 × 11
##         lng           locName howMany                sciName obsValid
##       <dbl>             <chr>   <int>                  <chr>    <lgl>
## 1  -87.8845 La Grange Hideout       1  Corvus brachyrhynchos     TRUE
## 2  -87.8845 La Grange Hideout       1 Zonotrichia albicollis     TRUE
## 3  -87.8845 La Grange Hideout       1    Agelaius phoeniceus     TRUE
## 4  -87.8845 La Grange Hideout       1   Melanerpes carolinus     TRUE
## 5  -87.8845 La Grange Hideout       1       Sitta canadensis     TRUE
## 6  -87.8845 La Grange Hideout       1  Cardinalis cardinalis     TRUE
## 7  -87.8845 La Grange Hideout      20      Passer domesticus     TRUE
## 8  -87.8845 La Grange Hideout       3   Haemorhous mexicanus     TRUE
## 9  -87.8845 La Grange Hideout       1     Picoides pubescens     TRUE
## 10 -87.8845 La Grange Hideout       2         Junco hyemalis     TRUE
## # ... with 148 more rows, and 6 more variables: locationPrivate <lgl>,
## #   obsDt <chr>, obsReviewed <lgl>, comName <chr>, lat <dbl>, locID <chr>

Search birds by geographic region

frenchbirds <- ebirdregion("FR")
frenchbirds %>%
  tbl_df()
## # A tibble: 233 × 11
##          lng          locName howMany                    sciName obsValid
##        <dbl>            <chr>   <int>                      <chr>    <lgl>
## 1  -1.276082 Les Ponts d'Ouve      10 Chroicocephalus ridibundus     TRUE
## 2  -1.276082 Les Ponts d'Ouve       3    Troglodytes troglodytes     TRUE
## 3  -1.276082 Les Ponts d'Ouve       3          Anthus spinoletta     TRUE
## 4  -1.276082 Les Ponts d'Ouve       1          Saxicola rubicola     TRUE
## 5  -1.276082 Les Ponts d'Ouve      30              Anas clypeata     TRUE
## 6  -1.276082 Les Ponts d'Ouve      50          Vanellus vanellus     TRUE
## 7  -1.276082 Les Ponts d'Ouve       3                Cygnus olor     TRUE
## 8  -1.276082 Les Ponts d'Ouve       1           Anthus pratensis     TRUE
## 9  -1.276082 Les Ponts d'Ouve     100         Anas platyrhynchos     TRUE
## 10 -1.276082 Les Ponts d'Ouve       2        Aegithalos caudatus     TRUE
## # ... with 223 more rows, and 6 more variables: locationPrivate <lgl>,
## #   obsDt <chr>, obsReviewed <lgl>, comName <chr>, lat <dbl>, locID <chr>

Search birds by species

warbler <- ebirdgeo(species = 'Setophaga coronata', lat = 41.8781, lng = -87.6298)
warbler %>%
  tbl_df()
## # A tibble: 53 × 11
##          lng                                        locName howMany
##        <dbl>                                          <chr>   <int>
## 1  -87.56039           South Shore Cultural Center, Chicago       1
## 2  -87.54932                         Rainbow Beach, Chicago       1
## 3  -87.63442          Montrose Point, Lincoln Park, Chicago       1
## 4  -87.51357      Lakefront Park and Sanctuary Migrant Trap       8
## 5  -87.53652                                   Park No. 566       3
## 6  -87.90470             Bemis Woods Forest Preserve--North       1
## 7  -87.76368     Skokie Lagoons Forest Preserve--Willow Rd.       1
## 8  -87.71768                Irving Park 60618 Miscellaneous       1
## 9  -87.90280 Wolf Road Prairie Nature Preserve, Westchester       1
## 10 -87.61893                            Grant Park, Chicago       3
## # ... with 43 more rows, and 8 more variables: sciName <chr>,
## #   obsValid <lgl>, locationPrivate <lgl>, obsDt <chr>, obsReviewed <lgl>,
## #   comName <chr>, lat <dbl>, locID <chr>

Use known location

ebirdgeo(species = 'Setophaga coronata') %>%
  tbl_df()
## # A tibble: 49 × 11
##          lng                                        locName howMany
##        <dbl>                                          <chr>   <int>
## 1  -87.80050                Turtlehead Lake Forest Preserve       1
## 2  -87.56039           South Shore Cultural Center, Chicago       1
## 3  -87.54932                         Rainbow Beach, Chicago       1
## 4  -87.63442          Montrose Point, Lincoln Park, Chicago       1
## 5  -87.51357      Lakefront Park and Sanctuary Migrant Trap       8
## 6  -87.53652                                   Park No. 566       3
## 7  -87.90470             Bemis Woods Forest Preserve--North       1
## 8  -87.71768                Irving Park 60618 Miscellaneous       1
## 9  -87.90280 Wolf Road Prairie Nature Preserve, Westchester       1
## 10 -87.61893                            Grant Park, Chicago       3
## # ... with 39 more rows, and 8 more variables: sciName <chr>,
## #   obsValid <lgl>, locationPrivate <lgl>, obsDt <chr>, obsReviewed <lgl>,
## #   comName <chr>, lat <dbl>, locID <chr>

Searching geographic info: geonames

# install.packages(geonames)
library(geonames)

API authentication

  • Registering for access
  • API key/token
  • Rate-limiting

Accessing the geonames API

  1. Go to the geonames site and register an account
  2. Click here to enable the free web service
  3. Tell R your geonames username

Bad API key storage

library(geonames)
options(geonamesUsername = "my_user_name")

# All my code here

Good API key storage

options(geonamesUsername = "my_user_name")
  • .Rprofile
  • Store in folder with .Rproj

Important details for .Rprofile

  • Make sure your .Rprofile ends with a blank line
  • Make sure .Rprofile is included in your .gitignore file
  • Restart RStudio after modifying .Rprofile
  • Check your spelling
  • You can do a similar process for an arbitrary package or key. For example:

    # in .Rprofile
    options("this_is_my_key" = XXXX)
    
    # later, in the R script:
    key <- getOption("this_is_my_key")

using Geonames

countryInfo <- GNcountryInfo()
countryInfo %>%
    tbl_df()
## # A tibble: 250 × 17
##    continent          capital            languages geonameId
##        <chr>            <chr>                <chr>     <chr>
## 1         EU Andorra la Vella                   ca   3041565
## 2         AS        Abu Dhabi    ar-AE,fa,en,hi,ur    290557
## 3         AS            Kabul    fa-AF,ps,uz-AF,tk   1149361
## 4         NA     Saint John’s                en-AG   3576396
## 5         NA       The Valley                en-AI   3573511
## 6         EU           Tirana                sq,el    783754
## 7         AS          Yerevan                   hy    174982
## 8         AF           Luanda                pt-AO   3351879
## 9         AN                                         6697173
## 10        SA     Buenos Aires es-AR,en,it,de,fr,gn   3865483
## # ... with 240 more rows, and 13 more variables: south <chr>,
## #   isoAlpha3 <chr>, north <chr>, fipsCode <chr>, population <chr>,
## #   east <chr>, isoNumeric <chr>, areaInSqKm <chr>, countryCode <chr>,
## #   west <chr>, countryName <chr>, continentName <chr>, currencyCode <chr>

rplos

library(rplos)
searchplos(q = "alcohol", fl = "id,title", limit = 10)
## $meta
##   numFound start maxScore
## 1    22718     0       NA
## 
## $data
##                              id
## 1  10.1371/journal.pmed.0040151
## 2  10.1371/journal.pone.0027752
## 3  10.1371/journal.pmed.0050108
## 4  10.1371/journal.pone.0071284
## 5  10.1371/journal.pone.0137790
## 6  10.1371/journal.pone.0153027
## 7  10.1371/journal.pone.0022994
## 8  10.1371/journal.pmed.0050104
## 9  10.1371/journal.pone.0099906
## 10 10.1371/journal.pone.0067386
##                                                                                                                                                                     title
## 1                                                                                                        Comparative Analysis of Alcohol Control Policies in 30 Countries
## 2                        Feasibility of an Alcohol Intervention Programme for TB Patients with Alcohol Use Disorder (AUD) - A Qualitative Study from Chennai, South India
## 3                                                                                                       Retail Sales of Alcohol and the Risk of Being a Victim of Assault
## 4  Ghrelin Receptor (GHS-R1A) Antagonism Suppresses Both Alcohol Consumption and the Alcohol Deprivation Effect in Rats following Long-Term Voluntary Alcohol Consumption
## 5                                                  Alcohol Use and Gamma-Glutamyltransferase Using a Mendelian Randomization Design in the Guangzhou Biobank Cohort Study
## 6                                                      Health Warnings on Alcoholic Beverages: Perceptions of the Health Risks and Intentions towards Alcohol Consumption
## 7                                                                                                               Evaluation of Un-Medicated, Self-Paced Alcohol Withdrawal
## 8                                                                                                                               Alcohol Sales and Risk of Serious Assault
## 9                                            Alcohol Tax Policy and Related Mortality. An Age-Period-Cohort Analysis of a Rapidly Developed Chinese Population, 1981–2010
## 10                          An Experimental Trial Exploring the Impact of Continuous Transdermal Alcohol Monitoring upon Alcohol Consumption in a Cohort of Male Students

Relative frequency plot

out <- plosword(list("alcohol", "heroin", "marijuana"),
    vis = "TRUE")
out$table
##   No_Articles      Term
## 1       22718   alcohol
## 2         744    heroin
## 3         513 marijuana
out$plot

Plots over time

plot_throughtime(terms = c("alcohol", "heroin", "marijuana"), limit = 200)

Scraping Twitter

  1. REST API
  2. Streaming API

Packages for Twitter

OAuth authentication

  1. Create a Twitter application for yourself
  2. Store your API key and token using the .Rprofile method.

    options(twitter_api_key = "Your API key")
    options(twitter_api_token = "Your API secret")
  3. Run from the console:

    library(twitteR)
    setup_twitter_oauth(consumer_key = getOption("twitter_api_key"),
                        consumer_secret = getOption("twitter_api_token"))
  4. At this point you should get a message back in RStudio “Authentication complete.”

Searching tweets

tweets <- searchTwitter('#rstats', n = 5)
tweets
## [[1]]
## [1] "TimWilliate: After using tidyjson for one morning, I wonder why it has not been rolled into tidyr or the tidyverse #rstats https://t.co/AWlL5WUzhr"
## 
## [[2]]
## [1] "AlexIrrthum: A nice series of blog posts on reproducibility in #rstats by @jzelner https://t.co/9CBNLDR0gt (GNU Make, Knitr, Docker, Gitlab CI...)"
## 
## [[3]]
## [1] "0xeinar: RT @seankross: Today's #rstats tip: use head() or View() after a pipe to preview an intermediate step in a data pipeline. https://t.co/Qix7…"
## 
## [[4]]
## [1] "Salvirt: RT @AnalyticsVidhya: Learn to create common as well as advanced Visualizations R. https://t.co/WvsSpOYcuN #dataviz #rstats https://t.co/9dG…"
## 
## [[5]]
## [1] "onuemeka: RT @juliasilge: Wow, this post by @ellis2013nz on @FiveThirtyEight's polling data is interesting/illuminating. #rstats https://t.co/P0lCv7V…"

Searching users

clinton <- getUser("hillaryclinton")
clinton$getDescription()
## [1] "Wife, mom, grandma, women+kids advocate, FLOTUS, Senator, SecState, hair icon, pantsuit aficionado, 2016 presidential candidate. Tweets from Hillary signed –H"
clinton$getFriends(n = 5)
## $`3153892631`
## [1] "RobbyMook"
## 
## $`18730233`
## [1] "daniellekantor"
## 
## $`4732338444`
## [1] "RyanForRecovery"
## 
## $`357606935`
## [1] "elizabethforma"
## 
## $`325830217`
## [1] "VP"

Tidying tweets

str(tweets)
## List of 5
##  $ :Reference class 'status' [package "twitteR"] with 17 fields
##   ..$ text         : chr "After using tidyjson for one morning, I wonder why it has not been rolled into tidyr or the tidyverse #rstats https://t.co/AWlL"| __truncated__
##   ..$ favorited    : logi FALSE
##   ..$ favoriteCount: num 0
##   ..$ replyToSN    : chr(0) 
##   ..$ created      : POSIXct[1:1], format: "2016-10-30 19:36:06"
##   ..$ truncated    : logi FALSE
##   ..$ replyToSID   : chr(0) 
##   ..$ id           : chr "792812339045117954"
##   ..$ replyToUID   : chr(0) 
##   ..$ statusSource : chr "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>"
##   ..$ screenName   : chr "TimWilliate"
##   ..$ retweetCount : num 0
##   ..$ isRetweet    : logi FALSE
##   ..$ retweeted    : logi FALSE
##   ..$ longitude    : chr(0) 
##   ..$ latitude     : chr(0) 
##   ..$ urls         :'data.frame':    1 obs. of  5 variables:
##   .. ..$ url         : chr "https://t.co/AWlL5WUzhr"
##   .. ..$ expanded_url: chr "https://github.com/sailthru/tidyjson/blob/master/DESCRIPTION"
##   .. ..$ display_url : chr "github.com/sailthru/tidyj…"
##   .. ..$ start_index : num 110
##   .. ..$ stop_index  : num 133
##   ..and 53 methods, of which 39 are  possibly relevant:
##   ..  getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet,
##   ..  getLatitude, getLongitude, getReplyToSID, getReplyToSN,
##   ..  getReplyToUID, getRetweetCount, getRetweeted, getRetweeters,
##   ..  getRetweets, getScreenName, getStatusSource, getText, getTruncated,
##   ..  getUrls, initialize, setCreated, setFavoriteCount, setFavorited,
##   ..  setId, setIsRetweet, setLatitude, setLongitude, setReplyToSID,
##   ..  setReplyToSN, setReplyToUID, setRetweetCount, setRetweeted,
##   ..  setScreenName, setStatusSource, setText, setTruncated, setUrls,
##   ..  toDataFrame, toDataFrame#twitterObj
##  $ :Reference class 'status' [package "twitteR"] with 17 fields
##   ..$ text         : chr "A nice series of blog posts on reproducibility in #rstats by @jzelner https://t.co/9CBNLDR0gt (GNU Make, Knitr, Docker, Gitlab "| __truncated__
##   ..$ favorited    : logi FALSE
##   ..$ favoriteCount: num 0
##   ..$ replyToSN    : chr(0) 
##   ..$ created      : POSIXct[1:1], format: "2016-10-30 19:36:01"
##   ..$ truncated    : logi FALSE
##   ..$ replyToSID   : chr(0) 
##   ..$ id           : chr "792812320753913856"
##   ..$ replyToUID   : chr(0) 
##   ..$ statusSource : chr "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>"
##   ..$ screenName   : chr "AlexIrrthum"
##   ..$ retweetCount : num 0
##   ..$ isRetweet    : logi FALSE
##   ..$ retweeted    : logi FALSE
##   ..$ longitude    : chr(0) 
##   ..$ latitude     : chr(0) 
##   ..$ urls         :'data.frame':    1 obs. of  5 variables:
##   .. ..$ url         : chr "https://t.co/9CBNLDR0gt"
##   .. ..$ expanded_url: chr "http://biorxiv.org/content/early/2015/11/16/031971"
##   .. ..$ display_url : chr "biorxiv.org/content/early/…"
##   .. ..$ start_index : num 70
##   .. ..$ stop_index  : num 93
##   ..and 53 methods, of which 39 are  possibly relevant:
##   ..  getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet,
##   ..  getLatitude, getLongitude, getReplyToSID, getReplyToSN,
##   ..  getReplyToUID, getRetweetCount, getRetweeted, getRetweeters,
##   ..  getRetweets, getScreenName, getStatusSource, getText, getTruncated,
##   ..  getUrls, initialize, setCreated, setFavoriteCount, setFavorited,
##   ..  setId, setIsRetweet, setLatitude, setLongitude, setReplyToSID,
##   ..  setReplyToSN, setReplyToUID, setRetweetCount, setRetweeted,
##   ..  setScreenName, setStatusSource, setText, setTruncated, setUrls,
##   ..  toDataFrame, toDataFrame#twitterObj
##  $ :Reference class 'status' [package "twitteR"] with 17 fields
##   ..$ text         : chr "RT @seankross: Today's #rstats tip: use head() or View() after a pipe to preview an intermediate step in a data pipeline. https"| __truncated__
##   ..$ favorited    : logi FALSE
##   ..$ favoriteCount: num 0
##   ..$ replyToSN    : chr(0) 
##   ..$ created      : POSIXct[1:1], format: "2016-10-30 19:34:43"
##   ..$ truncated    : logi FALSE
##   ..$ replyToSID   : chr(0) 
##   ..$ id           : chr "792811992763490304"
##   ..$ replyToUID   : chr(0) 
##   ..$ statusSource : chr "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>"
##   ..$ screenName   : chr "0xeinar"
##   ..$ retweetCount : num 3
##   ..$ isRetweet    : logi TRUE
##   ..$ retweeted    : logi FALSE
##   ..$ longitude    : chr(0) 
##   ..$ latitude     : chr(0) 
##   ..$ urls         :'data.frame':    0 obs. of  4 variables:
##   .. ..$ url         : chr(0) 
##   .. ..$ expanded_url: chr(0) 
##   .. ..$ dispaly_url : chr(0) 
##   .. ..$ indices     : num(0) 
##   ..and 53 methods, of which 39 are  possibly relevant:
##   ..  getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet,
##   ..  getLatitude, getLongitude, getReplyToSID, getReplyToSN,
##   ..  getReplyToUID, getRetweetCount, getRetweeted, getRetweeters,
##   ..  getRetweets, getScreenName, getStatusSource, getText, getTruncated,
##   ..  getUrls, initialize, setCreated, setFavoriteCount, setFavorited,
##   ..  setId, setIsRetweet, setLatitude, setLongitude, setReplyToSID,
##   ..  setReplyToSN, setReplyToUID, setRetweetCount, setRetweeted,
##   ..  setScreenName, setStatusSource, setText, setTruncated, setUrls,
##   ..  toDataFrame, toDataFrame#twitterObj
##  $ :Reference class 'status' [package "twitteR"] with 17 fields
##   ..$ text         : chr "RT @AnalyticsVidhya: Learn to create common as well as advanced Visualizations R. https://t.co/WvsSpOYcuN #dataviz #rstats http"| __truncated__
##   ..$ favorited    : logi FALSE
##   ..$ favoriteCount: num 0
##   ..$ replyToSN    : chr(0) 
##   ..$ created      : POSIXct[1:1], format: "2016-10-30 19:16:09"
##   ..$ truncated    : logi FALSE
##   ..$ replyToSID   : chr(0) 
##   ..$ id           : chr "792807318463385600"
##   ..$ replyToUID   : chr(0) 
##   ..$ statusSource : chr "<a href=\"http://www.twitter.com\" rel=\"nofollow\">Twitter for Windows Phone</a>"
##   ..$ screenName   : chr "Salvirt"
##   ..$ retweetCount : num 8
##   ..$ isRetweet    : logi TRUE
##   ..$ retweeted    : logi FALSE
##   ..$ longitude    : chr(0) 
##   ..$ latitude     : chr(0) 
##   ..$ urls         :'data.frame':    1 obs. of  5 variables:
##   .. ..$ url         : chr "https://t.co/WvsSpOYcuN"
##   .. ..$ expanded_url: chr "http://buff.ly/2dVCMWD"
##   .. ..$ display_url : chr "buff.ly/2dVCMWD"
##   .. ..$ start_index : num 82
##   .. ..$ stop_index  : num 105
##   ..and 53 methods, of which 39 are  possibly relevant:
##   ..  getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet,
##   ..  getLatitude, getLongitude, getReplyToSID, getReplyToSN,
##   ..  getReplyToUID, getRetweetCount, getRetweeted, getRetweeters,
##   ..  getRetweets, getScreenName, getStatusSource, getText, getTruncated,
##   ..  getUrls, initialize, setCreated, setFavoriteCount, setFavorited,
##   ..  setId, setIsRetweet, setLatitude, setLongitude, setReplyToSID,
##   ..  setReplyToSN, setReplyToUID, setRetweetCount, setRetweeted,
##   ..  setScreenName, setStatusSource, setText, setTruncated, setUrls,
##   ..  toDataFrame, toDataFrame#twitterObj
##  $ :Reference class 'status' [package "twitteR"] with 17 fields
##   ..$ text         : chr "RT @juliasilge: Wow, this post by @ellis2013nz on @FiveThirtyEight's polling data is interesting/illuminating. #rstats https://"| __truncated__
##   ..$ favorited    : logi FALSE
##   ..$ favoriteCount: num 0
##   ..$ replyToSN    : chr(0) 
##   ..$ created      : POSIXct[1:1], format: "2016-10-30 19:15:12"
##   ..$ truncated    : logi FALSE
##   ..$ replyToSID   : chr(0) 
##   ..$ id           : chr "792807080931520518"
##   ..$ replyToUID   : chr(0) 
##   ..$ statusSource : chr "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>"
##   ..$ screenName   : chr "onuemeka"
##   ..$ retweetCount : num 34
##   ..$ isRetweet    : logi TRUE
##   ..$ retweeted    : logi FALSE
##   ..$ longitude    : chr(0) 
##   ..$ latitude     : chr(0) 
##   ..$ urls         :'data.frame':    0 obs. of  4 variables:
##   .. ..$ url         : chr(0) 
##   .. ..$ expanded_url: chr(0) 
##   .. ..$ dispaly_url : chr(0) 
##   .. ..$ indices     : num(0) 
##   ..and 53 methods, of which 39 are  possibly relevant:
##   ..  getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet,
##   ..  getLatitude, getLongitude, getReplyToSID, getReplyToSN,
##   ..  getReplyToUID, getRetweetCount, getRetweeted, getRetweeters,
##   ..  getRetweets, getScreenName, getStatusSource, getText, getTruncated,
##   ..  getUrls, initialize, setCreated, setFavoriteCount, setFavorited,
##   ..  setId, setIsRetweet, setLatitude, setLongitude, setReplyToSID,
##   ..  setReplyToSN, setReplyToUID, setRetweetCount, setRetweeted,
##   ..  setScreenName, setStatusSource, setText, setTruncated, setUrls,
##   ..  toDataFrame, toDataFrame#twitterObj

Tidying tweets

df <- twListToDF(tweets) %>%
  tbl_df()
df
## # A tibble: 5 × 16
##                                                                          text
## *                                                                       <chr>
## 1 After using tidyjson for one morning, I wonder why it has not been rolled i
## 2 A nice series of blog posts on reproducibility in #rstats by @jzelner https
## 3 RT @seankross: Today's #rstats tip: use head() or View() after a pipe to pr
## 4 RT @AnalyticsVidhya: Learn to create common as well as advanced Visualizati
## 5 RT @juliasilge: Wow, this post by @ellis2013nz on @FiveThirtyEight's pollin
## # ... with 15 more variables: favorited <lgl>, favoriteCount <dbl>,
## #   replyToSN <lgl>, created <dttm>, truncated <lgl>, replyToSID <lgl>,
## #   id <chr>, replyToUID <lgl>, statusSource <chr>, screenName <chr>,
## #   retweetCount <dbl>, isRetweet <lgl>, retweeted <lgl>, longitude <lgl>,
## #   latitude <lgl>

Practice using twitteR

  1. Create a new R project on your computer
  2. Setup your API key with a Twitter app.
  3. Authenticate using the twitteR package in R
  4. Find the 50 most recent tweets by Donald Trump and store them in a data frame.
    • userTimeline() can be used to retrieve tweets from individual users
    • searchTwitter() finds tweets from any public account that references the username