Prerequisites for Twitter Data Crawling – Twitter API
What is Twitter API
API stands for Application Programming Interface. In general, the concept of an API refers to “standardized methods for allowing one software component to access the resources of another component” (Bucher, 2013).
Twitter API is the programmatic access that provided for companies, developers, researchers, and users to Twitter data.
According to Twitter:
“At a high level, APIs are the way computer programs ‘talk’ to each other so that they can request and deliver information. This is done by allowing a software application to call what’s known as an endpoint: an address that corresponds with a specific type of information we provide (endpoints are generally unique like phone numbers).”
While supporting numerous, diverse functions for interacting with Twitter, the API functions most relevant for extracting Twitter datasets include (Littman, 2017):
- Retrieving tweets from a user timeline – The list of tweets posted by an account;
- Searching tweets – The tweet contained specific hashtags or keywords;
- Filtering real-time tweets – The tweets as they are passing through the Twitter platform upon posting.
Access Twitter Data
Twitter allows data access via three different APIs:
- Standard API: 7-Day endpoint – What we gonna use today!
- Premium API: 30-day endpoint & Full-archive endpoint
- Enterprise API: 30-day endpoint & Full-archive endpoint
Obtain Twitter API
Obviously, you need a Twitter account. You are also required to create a developer account and generate keys and tokens.
- Apply and receive approval for a Twitter developer account
Tips: You might need to provide as much detail as possible (at least 100 characters), for shortening the approval process.
2. Create a Twitter developer app
3. Generate your app’s API keys and user’s access tokens
Access your app and go to the Keys and Tokens tab. The App Name, Consumer API keys and the Access token & Access token secret are required to provide when we collecting Twitter data.
Tips: For security consideration, Twitter only displays your access token and secret when you first generate it. You can revoke or regenerate them at any time, which will invalidate your existing tokens.
Using R to collect Twitter data
Useful R Package
- SocialMediaLab: SocialMediaLab is an R package that provides a suite of tools for collecting and constructing networks from social media data.
- vosonSML: vonsonSML is the SocialMediaLab package, with significant improvements and enhancements.
- rtweet: R client for accessing Twitter’s REST and stream APIs.
- twitteR: twitteR is an R package which provides access to the Twitter API.
- devtools: The aim of devtools is to make package development easier by providing R functions that simplify common tasks.
- httr: The aim of httr is to provide a wrapper for the curl package, customised to the demands of modern web APIs.
- magrittr: The Magritte is a package with two aims: to decrease development time and to improve readability and maintainability of code. Or even shortr: to make your code smokin’ (puff puff)!
- igraph: Routines for simple graphs and network analysis. igraph can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality indices and much more.
ggplot2: A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
- gender: Infers state-recorded gender categories from first names and dates of birth using historical datasets. By using these datasets instead of lists of male and female names, this package is able to more accurately infer the gender of a name, and it is able to report the probability that a name was male or female.
Reference and things to Read
- VOSON Lab http://vosonlab.net/SocialMediaLab
- Bucher, Taina. “Objects of intense feeling: The case of the Twitter API.” Computational Culture 3 (2013).
- Kim, Annice E., et al. “Methodological considerations in analyzing Twitter data.” Journal of the National Cancer Institute Monographs 2013.47 (2013): 140-146.
- Twitter Bots Tutorials https://digitalinspiration.com/docs/twitter-bots
- Twitter Bots Tutorials (A Full Guide for Beginners) https://learn.g2.com/how-to-make-a-twitter-bot
- Littman, Justin. “Where to get Twitter data for academic research.” (2017). https://gwu-libraries.github.io/sfm-ui/posts/2017-09-14-twitter-data
- Trupthi, M., Suresh Pabboju, and G. Narasimha. “Sentiment analysis on twitter using streaming API.” 2017 IEEE 7th International Advance Computing Conference (IACC). IEEE, 2017.
- Pfeffer, Jürgen, Katja Mayer, and Fred Morstatter. “Tampering with Twitter’s sample API.” EPJ Data Science 7.1 (2018): 50.
- Hino, Airo, and Robert A. Fahey. “Representing the Twittersphere: Archiving a representative sample of Twitter data under resource constraints.” International journal of information management 48 (2019): 175-184.
- Aguilar-Gallegos, Norman, et al. “Dataset on dynamics of Coronavirus on Twitter.” Data in Brief (2020): 105684.