What Is Machine Learning?
from Max Pixel
I’m South Korean by birth, but I spent most of my highschool years in Ireland. I wanted to remain in an English speaking country after I graduated, so I chose to go to the University of Waterloo, located in Canada.
During the first lecture I attended, I needed to edit something I wrote. I rummaged through my pencil case, but failed to find what I was looking for, so I turned to a couple of female students sitting to the left of me and said, “May I borrow your rubber?” They stared at me stone faced, not moving a finger. I looked back at them quizzically. I wondered to myself, is it rude to ask to borrow writing tools in Canada? After a few moments of silence, one of the students burst out laughing. She said, “I think he means an eraser,” and handed me the rubber that I’d asked for. “Strange,” I thought, “we don’t call those ‘erasers’ in Ireland.”
Language is a social construct. No one has ultimate authority over the definition of a word. Even dictionaries revise their own understanding of words as their usage evolves among the populace. ‘Gay’ does not mean what it used to mean. To understand what a word truly means, we therefore need to observe how most people use the word.
Some words, however, have yet to adopt a common understanding. ‘Machine learning’ is one such word. I’ve heard the term used to describe a wide range of concepts. Some say it’s a synonym for artificial intelligence. Others say it’s the set of techniques that power artificial intelligence. Some computer scientists describe it as “just statistics”, but I know statisticians who would cringe at that definition. For marketers, it’s a buzzword that needs to be shoved inside a presentation slide irrespective of its true meaning.
This clash of definitions has contributed to the mass confusion around the subject of machine learning, and has likely slowed its adoption in the real world. Afterall, how many decision makers would greenlight projects which their proponents can’t even articulate succinctly?
I will, in the next few paragraphs, describe and clarify what machine learning is. My description is borne out of years of practice in the field, so although no one has ultimate authority to define the term, my definition will be as good as anyone’s.
Machine learning is best understood as a two-sided coin, one side of which is theoretical and the other side practical. When viewed from the theoretical angle, machine learning models appear as sophisticated statistical models. The mathematical formulae that underpin statistics determine the shapes and depths of insights that a model can learn. Mathematical theory also guides practitioners towards appropriate model designs for different problems.
The bones and sinews of machine learning, which give models their strength from a practical perspective, consist of computer code that actualizes the blueprint given by statistics. Practitioners rely heavily on the field of computer science for best practices on translating theory into concrete numbers.
Computer science’s role extends beyond merely taking orders from its statistics masters. Mathematical formulae don't concern themselves with computational constraints. Even simple, innocent looking equations can take months of time and terabytes of RAM to compute if the practitioner is not careful. Computer science brings statistical theory down to the realm of the practically feasible, offering a range of solutions and advising on their associated tradeoffs.
Machine learning is often defined as much by what it’s not, as by what it is. People often juxtapose machine learning against more traditional statistical methods, and linear regression models in particular. Linear regression’s ubiquitousness in both academic and industry publications have made it the familiar friend that people turn to instinctively, in contrast to the strange looking machine learning newcomers.
But there is nothing mathematically special about the structure of linear regression that sets it apart from other machine learning models. Perhaps the only distinction is that linear regression models usually contain very few parameters, making them computationally inexpensive to run. The difference between machine learning and linear regression may thus be akin to the difference between a ship and a boat; while both types of vessels perform the same function, one is much larger and more capable than the other. But this distinction is fuzzy, and many practitioners consider linear regression to belong to the family of machine learning models.
Some statisticians, on the other hand, distinguish statistics from machine learning by the attitudes of those who practice in their respective fields. The statisticians surmise that machine learning practitioners’ solution to a problem consists of throwing random configuration changes at neural networks without thought, until the models start to generate usable numbers. But this is a misunderstanding. Only dilettantes approach machine learning the way they choose lottery numbers. Every serious machine learning practitioner I know gives careful consideration to statistical theory, as that is what gives each model its power.
There are three major categories of machine learning models - supervised, unsupervised and reinforcement learning. Each category distinguishes itself from others by the purpose it serves, and the way its member models learn.
Supervised machine learning models are like students who learn from their teachers. The models are given a set of “labels”, which are answers that models will strive to predict correctly. They’re also given “features”, which are like hints in questions that models will base their predictions on. If we were to train a model that predicts whether a sentence is grammatically correct, the model will be given the sentences as features, and whether or not each sentence is grammatically correct as the labels.
Unsupervised learning models, also known as ‘self-supervised learning’ models, are like students who learn unspoken lessons from their peers. They may learn, for instance, that most kids don’t like to eat vegetables, and that popular kids tend to be good at sports. The models, in other words, learn patterns that exist in the data without being explicitly told what to look for.
Finally, reinforcement learning models are like students who learn to play games. The goal, as in a game of tennis, is to score as many points as possible while conceding as few as possible. The models learn by trying different strategies, and learning from their failures in order to improve the strategies.
I eventually figured out what those female students thought I was asking for. I hope that my explanation of machine learning similarly clarifies the concept for you. Many people have polarized views on what machine learning is. Enthusiasts hail machine learning as black magic that will grant superhuman intelligence to machines. Detractors say it’s an empty buzzword, which is no more useful in practice than traditional statistical methods. The truth is somewhere in the middle. Machine learning models are statistical models, which when designed and trained with care, can become very good at narrowly defined tasks. That’s plenty to be excited about.