Linear regression is a fundamental machine learning algorithm used for predictive modeling and data analysis. Learn more about it as Curriculum Developers Alex and Sophie guide you through our new course: Linear Regression in Python.
In the latest addition to our Codecademy Live series, our Curriculum Developers will teach you the fundamentals of linear regression and its various applications. Linear regression is a technique for modeling quantitative outcomes using any number of predictors. For example, you could use it to understand the relationship between a person's income and other attributes such as education level and years of experience.
Learning about linear regression will prepare you to explore other kinds of machine learning models and enable you to read and understand research papers in just about any field of study!
Along with the walkthrough, Sophie and Alex will also be hosting 30-minute office hour sessions every Thursday between May 20th and June 3rd (and possibly longer — we'll keep you posted). Open to all learners, the office hours are a great opportunity to connect with the developers behind some of your favorite courses and ask any questions not answered during the stream.
How to watch
The live sessions will be streamed every Tuesday at 11am EDT from May 18th to July 13th (except July 7th) on YouTube, Twitch, Twitter, and Facebook. We'll be focused on the live chat on YouTube, so join us there if you want to be part of the conversation in real time.
Each session will last for about an hour, but don't worry if you can't make it — they'll all be recorded so you can watch them later at your convenience. For more details, check out the Codecademy Events page.
What we'll cover
Our Curriculum Developers will guide you through our free Linear Regression in Python course.
First, we'll introduce you to simple linear regression and show you how to implement it in Python with both quantitative and categorical predictors. Then, we'll move on to multiple linear regression and walk through some of the math behind the model before covering tools used to improve and build more flexible models, such as interactions, polynomial terms, and data pre-processing. Finally, we'll discuss ways of comparing a few different models in order to choose the "best" one. The last session will walk through the entire workflow with some real (and more messy!) data.
Below, you'll find detailed descriptions of each session.
Session #1: Introduction to simple linear regression
May 18th, 2021 at 11am EST | ADD TO CALENDAR
In this session, we'll introduce the concept of simple linear regression and learn how to implement it in Python. Linear regression is a machine learning technique that's used to predict and analyze quantitative outcomes, such as salary, time spent on a website, or adult height. In the process, we'll review algebra and graphing skills and learn about concepts that are applicable to many different machine learning algorithms.
Session #2: Categorical predictors
May 25th, 2021 at 11am EST | ADD TO CALENDAR
In this session, we'll demonstrate how to include a categorical predictor with two or more categories in a simple linear model. For example, we'll learn how to predict the price of a New York City apartment based on the borough where it is located. This will help us build our intuition for how to create and interpret more complex linear models.
Session #3: Introduction to multiple linear regression
June 1st, 2021 at 11am EST | ADD TO CALENDAR
In this session, we'll build our first linear model with more than one predictor. As an example, we'll use simulated data from a math class to predict student quiz performance based on hours of studying, number of completed assignments, and whether or not the student ate breakfast.
Session #4: The matrix representation of the linear regression problem
June 8th, 2021 at 11am EST | ADD TO CALENDAR
In this session, we'll dig into some of the math behind multiple linear regression. While there are a number of Python modules that allow us to fit a model without understanding this math, it is difficult to troubleshoot, interpret, and learn new technologies without it. This session will give you enough understanding to accomplish these things while still bypassing detailed algebra and calculus!
Session #5: Interactions and polynomial terms
June 15th, 2021 at 11am EST | ADD TO CALENDAR
In this session, we'll learn about some useful tools for modeling non-linear relationships using linear regression. This will allow us to build more flexible models, representing real-life relationships that cannot be summarized with a straight line.
Session #6: Data transformations
June 22nd, 2021 at 11am EST | ADD TO CALENDAR
In this session, we'll learn about data pre-processing that is useful prior to fitting a linear regression model. These kinds of transformations are potentially helpful when the assumptions of linear regression are otherwise violated. Pre-processing can also make the regression output more interpretable and therefore easier to communicate to a non-technical audience.
Session #7: Comparing and choosing a linear regression model
June 29th, 2021 at 11am EST | ADD TO CALENDAR
In this session, we'll discuss ways of comparing possible regression models and choosing the "best" one for prediction or analysis. In different business and research settings, the word "best" might mean different things. For example, if you need to be able to explain your model to a broad audience, you may want to prioritize interpretability over small gains in accuracy. By the end of this session, you'll understand some of the most common methods for choosing a model.
Session #8: Linear regression workflow
July 13th, 2021 at 11am EST | ADD TO CALENDAR
In this session, we'll walk through the full process of pre-processing some data, fitting a few different linear models, choosing the "best" one, interpreting the results, and using our model to make predictions for new data. We hope this session inspires you to build your own linear regression model to solve a problem that interests you!