Yelp Rating Predictor




This project is slightly different than others you have encountered thus far on Codecademy. Instead of a step-by-step tutorial, this project contains a series of open-ended requirements which describe the project you’ll be building. There are many possible ways to correctly fulfill all of these requirements, and you should expect to use the internet, Codecademy, and other resources when you encounter a problem that you cannot easily solve.

Project Goals

The restaurant industry is tougher than ever, with restaurant reviews blazing across the internet from day one of a restaurant’s opening. But as a lover of food, you and your friend decide to break into the industry and open up your own restaurant, Danielle’s Delicious Delicacies.

Since a restaurant’s success is highly correlated with its reputation, you want to make sure Danielle’s Delicious Delicacies has the best reviews on the most queried restaurant review site: Yelp! While you know your food will be delicious, you think there are other factors that play into a Yelp rating and will ultimately determine your business’s success.

With a dataset of different restaurant features and their Yelp ratings, you decide to use a Multiple Linear Regression model to investigate what factors most affect a restaurant’s Yelp rating and predict the Yelp rating for your restaurant!

In this project, we’ll be working with a real dataset provided by Yelp. We have provided six files, listed below with a brief description:

  • yelp_business.json: establishment data regarding location and attributes for all businesses in the dataset
  • yelp_review.json: Yelp review metadata by business
  • yelp_user.json: user profile metadata by business
  • yelp_checkin.json: online checkin metadata by business
  • yelp_tip.json: tip metadata by business
  • yelp_photo.json: photo metadata by business

For a more detailed explanation of the features in each .json file, see the accompanying explanatory feature document.

Setup Instructions

You have two options for completing this assignment. Either here, within Codecademy’s output terminal, or on your own, in case you’re more comfortable using a Jupyter notebook. (Note: If you choose to complete this project on Codecademy’s platform, please be aware that the plots may take up to one minute to load.)

If you choose to do this project on your computer instead of Codecademy, you can download what you’ll need by clicking the “Download” button below. If you need help setting up your computer, be sure to check out our setup guides:

Download the project folder by clicking the Download button below. Open yelp_regression_project.ipynb and follow the steps in the Jupyter Notebook. If you get stuck, you can look at yelp_regression_solution.ipynb for the answer.

Thank you Yelp for this partnership and especially:

  • Sebastian Couvidat, Sr. Data Scientist
  • Jenny Lin, Data Scientist
  • Jessica Hart, Corporate Counsel
Pizza Emoji

Happy Coding!