Designing Neural Networks
from pxfuel
Unfamiliar terms have a way of impressing us. I remember the first time I heard about the ‘Monte Carlo’ method. The name conjured up an image of a sophisticated technique, born out of deep discussions by brilliant mathematicians in a Spanish cafe. Turns out, it’s just a by-word for running lots of randomized simulations. Numerous other fancy terms likewise dress up simple concepts. ‘Linear interpolation’ means connecting dots in straight lines. ‘Stochastic’ means random. ‘Generalized AutoRegressive Conditional Heteroskedasticity’ (GARCH) is…, well, it’s a statistical model that I can’t explain in a sentence or two, but its complexity doesn’t merit a 20-syllable name.
‘Neural networks’ is one such awe-inspiring name. It refers to a phylum of statistical models, named after scientists’ attempts to mimic the workings of the brain. This may sound complicated at first - much of the brain’s mechanics remain a mystery to this day. But scientists’ computerized knock-offs of the real thing are easier to understand.
The earliest neural networks imitated neurons, which are the building blocks of the brain. Each neuron receives stimuli from multiple sources through branches called ‘Dendrites’. When the combined stimulus exceeds a threshold, the neuron fires a signal through a stem called the ‘Axon’. A mathematical model called the ‘Perceptron’ embodied a simplified version of a neuron. It consisted of two parts - a linear combination part that summed the influences of multiple factors, and a function that flagged ‘1’ or ‘0’ depending on whether the sum exceeded a threshold.
The Perceptron model enjoyed some success, and led scientists to follow up with more complex imitations of the brain. Observing that outputs of neurons often fed into other neurons as inputs, scientists created models consisting of ‘stacks’ of smaller models where the outputs of lower layers fed into higher layers. The MADALINE model, for instance, consisted of 3 layers of ADALINE models where each ADALINE mimicked a neuron.
Before long, however, scientists stopped trying to copy the brain. They discerned the statistical properties that granted early neural networks their power, and devised modifications - grounded in statistical theories - that enhanced their powers. That the modifications deviated from the schematics of the brain didn’t bother the scientists. Their primary goal was to create useful models, and any resemblance to the brain was merely a nice talking point. Modern neural networks usually don’t work like the brain at all. The term has instead come to denote statistical models consisting of blocks of smaller components that work together as a whole.
Several types of base components form the bulk of most neural network models. The linear combination layer, represented as nodes in graphical representations of neural networks, sums inputs from sources and outputs the results. Activation layers morph these outputs using simple formulas. Though simple, these layers enable neural networks to learn complex patterns that more traditional statistical models would miss. One of my previous blog posts explains one popular activation function called the ‘sigmoid’. Other layers such as batch normalization and dropout don’t help with pattern recognition, but allow models to train faster and avoid falling for false patterns.
from Wikimedia Commons
Some base components are often packaged together into modules that exhibit more complex behaviour, like cells forming together to form organs. The neural attention mechanism, for instance, orients a set of components to act as a mask that only lets “important” information through.
Scientists categorize neural networks by the modules they use. ‘Convolutional neural networks’ are models that make heavy use of convolutional filters, a concept which I explained in this blog post. ‘Recurrent neural networks’ use some nodes as memory banks to carry information forward in time. These categories have subcategories, splintered by the use of module variants. Recurrent neural networks, for instance, spawned Long Short-Term Memory (LSTM) networks, Gated Recurrent Unit (GRU) and Neural Turing Machines (NTMs), with each possessing different mechanisms for retaining memory. New modules and their corresponding categories continue to be invented.
The data scientist’s job is to put together a set of components and modules, forming the perfect set of traps for catching insight that lurks in the forest of data. Scientists can’t simply fulfill their role by throwing every possible combination of components and modules at the problem. Each component and module acts like a different type of a trap, with some geared towards catching bears and others towards catching birds. Throwing every component together into a single model would force the model to catch all sorts of unwanted patterns. Just as there are consequences to unnecessarily trapping animals, there are costs associated with detecting false patterns as well: a model’s performance will suffer with increasing presence of false patterns. Scientists must therefore design their neural networks with care, giving thought to the selection and placement of each component.
Hunters begin their task of trapping animals by forming theories, such as where the animal will likely tread, and what type of bait will lure them. Data scientists must likewise start with theories. Belief that investors’ fear and greed drive stock markets will lead scientists down one path of neural network design, involving components that can model evolving emotional states. A different belief that investors base their trades on stock market charts will lead scientists down another path, one that incorporates components capable of analyzing the shapes of such charts. These are simple examples. Effective models are often born out of more complex theories that string many smaller theories together.
Designing neural networks is both an art and a science. It is a science, because scientists need to deeply understand the mathematics underpinning their tools. It is an art, because there is no one correct way to translate theories into models, but only gradations of quality in which great models vastly outperform average models. Creating great models requires creativity on the part of the scientist, who must sort through a bazaar’s worth of choices of components and modules.
In both the arts and sciences, quality of talent, more so than the quantity of talent, determines the value of the output. A hundred average scientists couldn’t have discovered Einstein’s theories of relativity, nor could a hundred average artists have painted a masterpiece like Van Gogh’s Starry Night. Talent quality likewise predominates quantity for neural networks design. It takes rare talent to see order within a mess of data, form coherent theories, devise formulas that best express those theories, and then translate them into computer code. Talented data scientists can craft models that perform orders of magnitude better than those created by average data scientists. Compensation schemes reflect the talent gap. Whereas the average data scientist earns about $100,000 per year, superstars routinely make millions.
Elite data scientists can easily justify their higher pay. The economic potential of self-driving cars, for example, is many times greater than the salaries paid to those scientists. But not every data science problem has the economic upside of self-driving cars, nor its technical complexity. Some problems may only call for a few decent data scientists. Decision makers therefore need an accurate gauge of both the business upside and the technical complexity to hire appropriately.
There is debate within the finance community regarding the optimal level of data science in asset management. Some believe that the technical challenges of applying machine learning (including neural networks) are too great, at least in comparison to its economic upside. They may even argue that machine learning doesn't have the ability to unearth hidden patterns in the markets, and favour lower complexity models instead. But those on the other side believe that hidden patterns are waiting to be discovered, that neural networks are one set of tools that can detect those patterns, and that their discovery can be wildly profitable.
We firmly place ourselves in the latter camp. The success of hedge funds such as Renaissance Technologies and Two Sigma argues our point. But many in the financial industry still belong to the former camp, having been disappointed by their own past efforts to utilize machine learning. We see it as our mission to convince the world that the potential of neural networks, and machine learning in general, is vast for finance.