Applications of Regression

Feb 01, 2019

Machine learning is the field of computer science that gives computer systems the ability to learn from data — and it’s one of the hottest topics in the industry right now.

So now that we understand exactly how linear regression works, we will take a look at why it so useful in so many different fields. Well, for one, it is rather simple. The math involved in finding the best fit regression line is not that complicated and it has been thoroughly studied over the years. In spite of being so simple, linear regression is in fact rather powerful. By modeling relationships between variables as straight lines or planes, regression produces a general solution which is not as prone to overfitting as many other techniques.

Another feature of regression is that it is very versatile and can be used for various kinds of data, from predicting stock prices to estimating sea levels. The fact that regression is simple and well-studied means that there are multiple implementation techniques available in various languages.

And in fact, regression happens to be the simplest of the machine learning algorithms. We will now zoom in on the applications of linear regression. One of the common use cases for regression is to explain the variance in the underlying data. For example, the price of a stock may be determined by multiple factors. This includes the health of the economy overall, and maybe even the price of oil or steel or many other commodities.

But out of all these factors, there will be some which will explain the variance in the price of a stock much better than the others. So for example, if the particular stock you are tracking happens to be less sensitive to the health of the overall economy, and more sensitive to the price of oil, regression will help you determine this relationship.

[Video description begins] A question appears on the screen. It reads: How much variation in output is caused by a certain feature? [Video description ends]

And of course, we have seen that regression can be used in order to make predictions when the value you need to predict happens to be a continuous variable. So, if you're using your regression model in order to estimate the price of a stock, you could for example change the value of one of the input variables. So if you'd like to determine the value of a stock if there is a 20% drop in the price of oil. You could make use of regression in order to make that estimation.

[Video description begins] A question appears on the screen. It reads: If I change one feature, how much does that affect the output? [Video description ends]

When you are using regression in order to predict an outcome y given an input x, there are a few caveats. For instance, there needs to be a causal relationship between x and y and their values should not merely be correlated. For example, a cause can be the change in the quantity of rainfall in a particular region, and the effect will be a change in the yield of crop. It has been empirically proven that rainfall does effect the yield of crops, and it is not just that these two factors are correlated.

Also, this is the case, where x causes y and not the other way around. That is, it is not a change in the crop yield which effects the quantity of rainfall. So if the relationship between rainfall and the crop yield can be represented by this straight line. Consider that the crop yield, which is measured in metric tons per hectare, can be calculated by a straight line equation, alpha + beta times x, where x represents a quantity of rainfall in inches.

[Video description begins] A graph displays. The X-axis denotes the amount of Rain (which is the cause in this example), while the Y-axis denotes Crop yield (which is the effect in this example). There are many dots, plotted on the graph, which indicates an increase in crop yield with an increase in rain. A slanted straight line is drawn through the dots, showing the increase. The equation that appears next to the graph is, Regression line y is equal to alpha plus beta x. [Video description ends]

When presented with such a model, there are a few terms you need to be familiar with. For one, the term alpha in the equation is the y-intercept of the straight line. This represents the quantity of crop which will be produced even if there is no rainfall at all, and this is a very useful term in regression. And for that, consider that there are a number of farmers in the same geographical region who grow the same crop.

[Video description begins] The line extends itself in the reverese direction as a dotted line to meet the Y-axis or Crop yield. The distance between this point of the Y-axis and the zero on the X-axis is denoted as alpha. [Video description ends]

If a regression line such as this one is generated for each of the farmers over a number of years, then the distinguishing factor between each of the farmers will often be the alpha number. This is because all of these farmers will get the same quantity of rain, but each of their individual techniques when growing the crop will be captured by the alpha value.

And you could say that the farmer with the higher alpha happens to be a better farmer. And then there is the beta in the equation, which represents the slope of the line. This determines the sensitivity of the output, which is the crop yield, to the input, which is the quantity of rainfall. So when the input, which is the quantity of rainfall, increases by 1 unit, the output, which is the crop yield, increases by beta units.

[Video description begins] This is explained graphically with the use of the line and two dots on the line. The first dot on the line acts as a reference point. A horizontal dotted line is drawn from this point to show an increase of 1 unit of x. At the point where this increment ends graphically, a vertical dotted line is drawn to meet the slanted line. The length of this vertical line is denoted as beta. [Video description ends]

And of course, once we have this equation for the regression line, which is y is equal to alpha plus beta times x, we can use this in order to make predictions.

[Video description begins] The graph depicts the predictive relation between the rain and the crop yield. 2 perpendicular lines are drawn from the middle of the Y-axis and the X-axis to meet the regression line. Above the graph is the instruction: Given a new value of x, use the line to predict the corresponding value of y. [Video description ends]

So if the weather forecast predicts that for this region, there will be 13 inches of rain in the season, then a farmer can estimate that their crop yield will be 35 metric tons, and then plan accordingly.