June 24, 2023
The “Twitter Sentiment Analysis” project focuses on the sentiment analysis of tweets related to McDonald’s and KFC. It utilizes Python and various libraries to collect tweets, preprocess the data, and determine the sentiment by calculating various metrics. It provides fundamental knowledge and hands-on code for those new to Python and sentiment analysis.
The project code is available on GitHub here.
Language : Python
Libraries :
To collect the tweets, the project utilizes the Tweepy library, which provides access to the Twitter API. Users can specify keywords or hashtags to gather relevant tweets. In this project, 5000 tweets were collected for each keyword: “mcdonalds” and “kfc”. The tweets were collected in English language only. The data collected spans from October 1, 2022, to October 3, 2022.
The project also collected information about the authors of the tweets mentioned above. For McDonald’s, there are 4406 unique author IDs, and for KFC, there are 4396 unique author IDs. The project successfully fetched 4390 unique and valid author information for McDonald’s and 4373 for KFC. There are a few authors whose IDs were not found, which might be due to the deactivation of their accounts.
Note: Tweepy has a rate limit of 300 author data requests every 15 minutes (as of October 3, 2022). To adhere to this limit, the project includes a sleep function during author data querying.
The collected tweets undergo preprocessing steps to clean and prepare them for sentiment analysis. The NLTK library is used for tasks such as tokenization, removing stopwords, and stemming. URLs, mentions, and special characters are removed to extract meaningful text for sentiment analysis.