Twitter Classification




This project is slightly different than others you have encountered thus far on Codecademy. Instead of a step-by-step tutorial, this project contains a series of open-ended requirements which describe the project you’ll be building. There are many possible ways to correctly fulfill all of these requirements, and you should expect to use the internet, Codecademy, and other resources when you encounter a problem that you cannot easily solve.

Project Goals

There are two parts to this project that can be done in either order.

In the first part, you will make a system that predicts whether or not a tweet will go viral by using a K-Nearest Neighbor classifier. What features of a tweet do you think are the most important in determining its virality? Does the length of the tweet matter? What about the number of hashtags? Maybe information about the account that sent the tweet is most important. You’ll answer these questions while using DataFrames and Matplotlib visualizations to present your results!

In the second part of this project, you’ll test the power of Naive Bayes classifiers by creating a system that predicts whether a tweet was sent from New York City, London, or Paris. You will investigate how language is used differently in these three cities. Can the classifier automatically detect the difference between French and English? Can it learn local phrases or slang? Can you create tweets that trick the system?

Setup Instructions

You have two options to complete this assignment. Either here, within Codecademy’s code editor, or on your own, in case you’re more comfortable using a Jupyter notebook.

If you choose to do this project on your computer instead of Codecademy, you can download what you’ll need by clicking the “Download” button below. If you need help setting up your computer, be sure to check out our setup guides:

Open twitter_classification_project.ipynb and follow the steps in the Jupyter Notebook. If you get stuck, you can look at twitter_classification_solution.ipynb for the answer.

(Note: The project within Codecademy’s code editor has a limited amount of data available for you to analyze. If you want more tweets to look at, feel free to download the project files and work off-platform!)