Linear Regression: Useful, Ubiquitous, Limited
"acoustic guitar portrait" by tonyduckles is licensed under CC BY-NC-SA 2.0
Linear regression is the acoustic guitar of the statistical world. You can play just about any song with an acoustic guitar. Some songs, such as ‘More Than Words’, are best played using the acoustic alone. But for most songs, the acoustic by itself can’t express the depth of feeling that an ensemble of drums, keyboards and bass guitars can.
So it is with linear regression. This approach generally does an adequate job of modeling most regression problems, that is, predicting numbers that can take on a wide range of values. Occasionally, linear regression yields the best possible model. But most real world problems contain complexities that linear regression can’t express.
Linear regression, as with most other types of models, uses factors to make predictions. Factors are numbers containing causal data. A statistician who desires to predict rainfall may, for instance, choose cloudiness and humidity as factors.
Factor sensitivities are numbers that describe the relationship between factors and predictive outcomes. Rain generally falls harder the cloudier it gets, so we would expect the cloudiness sensitivity to take on a positive number.
Both the strengths and limitations of linear regression derive from the simplicity with which it links factors to predictive outcomes. Linear regression models reality as the sum of influencing factor and sensitivity pairs. Each factor acts in isolation from each other, so the first factor doesn’t care about the second factor’s value. As long as the first factor’s value remains the same, the first factor’s influence on predictive outcomes remains the same also.
This non-communication between different factors can have undesirable results. Say our model forecasts 1mm more rain when humidity rises by 2%. In this case, it won’t matter an iota whether the sky is filled with dark clouds, or is clear blue - if humidity rises by 2%, the model will forecast 1mm more rain. This is contrary to reality, where humidity is a better predictor of rain when the sky is cloudy versus when the sky is clear.
Linear regression models also give no consideration for each factor’s starting value. Outdoor humidity generally ranges from 50% to 90%. Whether humidity rises from 50% to 52%, or from 90% to 92%, the model would still forecast 1mm more rain each time humidity increments by 2%. But a humidity level of 52% is still on the dry side, so we wouldn’t expect any change in the amount of rain until humidity reaches a higher level.
This simplistic mechanism that links factors to predictive values introduces quirks that cause modelers headaches. Say our model predicts no rain when the sky is clear and humidity is 60%. If the sky stays clear and the humidity drops to 58%, would the model predict negative rainfall? Indeed it would. But not even the most cavalier weatherman would predict that 1mm of rain would rise from the ground to disappear into the sky.
A model works best if its mechanics reflect reality as closely as possible. Linear regression models therefore work best when factors affecting the predictive target work in isolation, and when there are no special numbers around which models must behave differently. The rainfall phenomena we’ve described grates heavily against these assumptions, and serves as an example of when not to use linear regression. But there are many real world phenomena that would be served well using linear regression.
Take, for example, the prediction of a country’s aggregate beer consumption on a particular day. Two factors we may use are the day’s temperature and whether that day falls on a weekend, since people imbibe more eagerly on hot weekends. Temperature and day of the week are unrelated to each other, so it’s sensible to treat them in isolation. A country’s aggregate beer consumption would never realistically approach 0, so we don’t need to worry about the behavior of the model near that special number. Linear regression should therefore model this phenomenon well.
Experience suggests that linear regression works reasonably well even when there’s misalignment between assumptions and reality, as long as the misalignment is slight. Say we now want to predict the salary of a Google employee, and we take years of total experience, and years working at Google itself, as separate factors of a linear regression model. The two factors are clearly related - employees who worked at Google for 5 years can’t have amassed less than 5 years of total experience. Yet in practice, the two factors tend to behave independently enough to make such a model usable.
When data is sparse, we may even prefer linear regression models to more complex models whose assumptions fit better with reality. Complex models are a lot like complex laws. Changes to tax laws designed to benefit the poor often end up benefiting the rich as increased complexities create more loopholes. Statistical models, likewise, have a habit of performing more poorly with more complexity as it introduces more special cases.
Theory supports the preference for simpler models. Occam’s Razor is a philosophical principle that states that the simplest explanation of a phenomenon is usually the right one, provided the data fits the explanation. Information theory supports this principle with formal mathematics. So when a linear regression model and a more complex model perform similarly well on statistical metrics, philosophical and mathematical theories would weigh in favor of linear regression.
Interpretability is another big benefit of linear regression’s simplicity. If our Google salary model has a sensitivity of 10,000 associated with the factor indicating total years of experience, that suggests an employee would expect to earn $10,000 more per year with each additional year of experience. Making sense of models becomes more difficult for even marginally more complex models such as logistic regression models, and it becomes impossible for significantly more sophisticated models such as neural networks.
It’s therefore no surprise that linear regression remains the model of choice in finance. Financial data is very noisy; without proper tuning, many different types of models would appear to perform similarly well. Occam’s Razor would push researchers to use linear regression in such situations. Financial professionals also prefer to use models they can interpret.
But when properly tuned, more sophisticated models often perform significantly better than linear regression, because the mechanics of those models align better with reality. Most financial factors, such as profitability and momentum, work in tandem to influence a stock’s price. The extent that these factors influence stock prices also changes at different levels - investors make relatively little distinction between a stock that rose 200% and one that rose 210%, but make a much greater distinction between a stock that declined 5% and one that rose 5%. When performance differences are significant, Occam’s Razor doesn’t weigh enough in favour of linear regression to make us prefer it over more sophisticated models.
One could ignore the benefit of more sophisticated models and stick with linear regression. Doing so would still deliver decent results, much of the time. But as no acoustic version of ‘Billie Jean’ could possibly do justice to Michael Jackson’s original, so linear regression can’t possibly create a model which delivers extraordinary performance.