Download Twitter Data with 10 Lines of Code

Robbie Geoghegan
2 min readDec 31, 2019

--

Using the Python package “GetOldTweets3” to access Twitter Data — no developer license needed.

Photo by Kon Karampelas on Unsplash

I should start out by saying that the most robust approach for downloading Twitter data is to go to the source, sign up for a developer license with Twitter and access their API directly using Tweepy. However there is a much faster way to get your hands on Twitter data.

This guide is instead intended for those wanting to do one of the following:

  • Conduct some quick and simple analysis with Twitter data (this code can be executed in less than 10 minutes)
  • Access Tweets older than 1 week (the Twitter API only serves Tweets from the past week)
  • Download a large volume of Tweets (Twitter API limits the number of Tweets you can download after around 3,000)

The below guide will show how to download Twitter data using the Python package “GetOldTweets3” (documentation can be found here). This package allows you to set many useful filters for more targeted Tweet downloads including filtering by keywords, Twitter usernames, location and date ranges. To get started install the package.

pip install GetOldTweets3

FILTERING BY KEYWORD AND LOCATION

First define what keyword and location to filter Tweets by, include date ranges and the maximum number of tweets.

Set filters for what Twitter data to download

Next use the package functions to download the Twitter data. Set up a DataFrame with the Twitter information.

Get Old Tweets

Specific Twitter information needs to be extracted from the filtered data we’ve stored in the DataFrame above. We can extract several data points for each Tweet, including:

  • Tweet text
  • Username
  • Date of Tweet
  • Hashtags
  • Links to each Tweet
  • Retweets
  • Favorites
  • Mentions

Let’s define a function to extract text, dates, hashtags and links to Tweets.

Function to Extract Twitter Information

Finally execute the get_twitter_info function to return a DataFrame containing the 10 tweets we searched for with columns for each of the data points we extracted above.

FILTERING BY USERNAME

The package also enables you to filter by specific usernames in a similar way to keywords and locations. Simply define the username and run the code below.

Filter by Username

FILTERING BY MULTIPLE LOCATIONS + EXPORT TO CSV

To gather tweets from multiple locations we can build a simple loop that leverages what we’ve defined above. First define a list of the locations of interest; useful if you want to analyze multiple cities or if you are searching variations of a specific location (e.g. New York, NY, Big Apple).

Bringing it all Together

CONCLUSION

While this package isn’t perfect in its data coverage of tweets, it enables quick and easy access to Twitter data using Python. Furthermore it has some advantages to the Twitter API I outlined in the introduction. The full Python code is accessible at my Github. I hope you enjoy!

Support me by buying my children’s book: mybook.to/atozofweb3

Sign up to discover human stories that deepen your understanding of the world.

--

--

Robbie Geoghegan
Robbie Geoghegan

Written by Robbie Geoghegan

Data Scientist and Author of “ABCs of Artificial Intelligence” and “A to Z of Web3” available: https://mybook.to/abcsofai & mybook.to/atozofweb3

Responses (1)

Write a response