Why Classification Often Beats Regression For Security Selection
The movie “Into the Wild” depicts the story of Chris McCandless, who after graduating from university, renounces all possessions and decides to live in the wilderness. Not long after he achieves his dream and settles in an abandoned bus in Alaska, he mistakenly eats the seeds of a wild sweet pea, which unlike the similar looking wild potato, is toxic. The blunder robs McCandless of his ability to digest food, and the movie ends with him dying of starvation.
In life, we often encounter situations where two solutions to the same problem look similar on the surface, but lead to drastically different results. This type of situation also presents itself in quantitative finance, and in this article, I want to focus on the choice between creating algorithms targeting classification, versus regression, in the context of security selection.
A security selection algorithm’s purpose is to score each security by their investment potential, and there are two main methods of such scoring with the use of machine learning. The regression method tries to predict the exact level of future returns. For example, it will draw a distinction between stocks that return 5%, versus those that return 10%, and it will produce forecasts that get as close to the exact return as possible.
The classification method, on the other hand, groups returns into categories. If the category is “all stocks that have positive returns”, then stocks that have returned 5% and 10% will get grouped into the same category. The classification algorithm would then forecast the chance that each security would belong to this category.
At first glance, it may appear that regression algorithms should yield superior results as compared to classification. We lose information when we turn numbers of different grades into a single category, and we care about that lost information - a stock with a forecasted return of 10% is more valuable than a stock which forecast is 5%. Moreover, a regression forecast can be turned into a classification category, but not vice versa. For example, we can classify a stock that is forecasted to return 7% into the “all stocks that have positive returns” category, but we can’t infer a stock’s potential return just by knowing, for example, that a stock has a 60% chance of generating a positive return.
I was confused, therefore, when I’ve found that classification algorithms tend to outperform regression algorithms. For example, the following graph shows the partial backtest results of two algorithms that we’re working on:
You can see that the two algorithms are highly correlated. Both algorithms use the same inputs to score securities, and use very similar machine learning models. The only difference comes from the scoring target - the Classification algorithm transforms the raw target values into categories first, whereas the Regression algorithm leaves the target values raw.
Although the two algorithms are highly correlated, you can see that the classification algorithm outperformed the regression algorithm. The key, as it turns out, is uncertainty. Let me explain using a hypothetical.
Let’s say there are two securities - A and B. A has an expected return of 5%, and the uncertainty of this return, expressed as the standard deviation around the mean, is 10%. B, on the other hand, has a higher expected return of 10%, but also has higher standard deviation of 40%.
It’s important to note that the standard deviation refers to the uncertainty of prediction, rather than the volatility of a given stock price. One can have high confidence in predicting the return of a volatile stock, and vice versa. For example, if you knew that a small biotech company was going to release favourable trial results for a drug, you’d have high confidence in the future return of the stock even if the stock price itself has been volatile. But as a matter of practice, predictive uncertainty and volatility generally go hand in hand.
To illustrate the importance of predictive uncertainty, let’s take a hypothetical example. Let’s say we train a regression algorithm that scores each security according to their expected returns. This algorithm, if properly trained, would consistently score security B higher than A. On the other hand, if we train a classification algorithm that has the higher chance of giving positive returns, it would score A higher than B - the former has a 69% chance of generating higher returns whereas the latter only has 60%.
Let’s now devise a strategy that only invests in the higher scored security. The regression strategy would therefore always invest in B, while the classification strategy would always invest in A. Which strategy do you think would win? If you guessed the classification strategy, then you’d be correct.
The graph above shows a sample cumulative return of the two different strategies, based on 100 randomly generated returns. Each randomly generated simulation paints a different picture, but the results generally end up the same - though the regression strategy might enjoy periods of outperformance, the classification algorithm wins in the end.
The regression algorithm underperforms despite it having higher expected returns because it’s more likely to incur losses, and those losses are difficult to recover from. For example, it has about a 7% chance of losing half of its value in any given period. If it does incur such a loss, it would take a 100% gain to get back to where it was before, and the strategy only has a 1% chance of achieving such a gain in just one period. The classification strategy, on the other hand, has a negligible chance of losing half of its value.
Mathematically astute readers may note that this phenomenon only occurs because I’m assuming that prediction uncertainty follows a Normal distribution. According to this distribution, a randomly generated return has the same chance of underperforming the expected return by X%, as it has of outperforming the mean by X%. However, much of the financial literature assumes security prices to follow a Log-normal distribution, where underperforming the expected return by X% is less likely than outperforming by the same amount.
If we were to assume that security prices follow the Lognormal distribution, strategies that have higher expected returns would always win, regardless of the uncertainty around the expected returns. However, empirical evidence suggests that security prices don’t follow the Log-normal distribution for short time horizons of less than a month, so predictive uncertainty generally wouldn’t form this distribution either.
Therefore, in order to make strategies that are useful in the real world, we need to penalize high uncertainty around return predictions. Classification strategies do this implicitly - higher predictive uncertainty generally leads to higher chances of losses. But, using classification strategies come at the cost that we mentioned earlier in this article. Are there any better ways? I believe there are.
One way to compensate for returns uncertainty is to devise a regression target that penalizes uncertainty. For example, we may define a new target ‘y’ to be the following:
y = μ - σ2/2
Where μ is the security’s expected return and σ is the uncertainty of this return. The higher the σ, the greater the penalty applied to μ. I chose this penalty as it is the formula for translating Normal distribution’s mean to the Log-normal distribution’s mean. In effect, it adequately compensates for uncertainty such that securities with higher ‘y’ scores would always outperform securities with lower ‘y’ scores, regardless of the uncertainty. For example, security A from before would have a ‘y’ score of 4.5%, while B would have a ‘y’ score of 2%.
But what if the uncertainty around security B’s prediction was lower, with a standard deviation of 30%? Then, security B’s score would rise to 5.5%, and we should favour investing in B. Indeed, simulations bear this out.
The graph above shows a typical outcome of the same strategy we used previously, but this time with B’s standard deviation lowered to 30%. Although the regression strategy doesn’t always win in every randomly generated simulation, it does so more often than not.
While this particular method of penalizing regression targets work, it’s not the only one. But regardless of the method used to solve it, I hope I’ve convinced you that it’s problematic to use regression targets without incorporating prediction uncertainty into account. Turning a regression target into a classification target is one way of dealing with this problem, but you may find that there are better paths forward.