What's New In Machine Learning?
JeanFrancoisPuget 2700028FGP Visits (22829)
What has changed in Machine Learning in the past 25 years? You may not care about this question. You may even not realize that Machine Learning as a technical and scientific field is older than 25 years. But I do care about this question. I care because I got a PhD in Machine Learning in 1990. I then moved sidetrack to work on constraint programming and mathematical optimization. I am back to machine learning since a couple of years, and I did ask myself this: is my PhD still relevant, or has Machine Learning changed so much that I need to learn it again from scratch?
To cut a long story short: yes my PhD isn't relevant anymore. The topics I was working on, namely symbolic machine learning, and inductive logic programming, are not very popular these days.
Does it mean I needed to learn Machine Learning from scratch? Fortunately for me, the answer to that question was no.
Because of two reasons:
1. Machine Learning technology hasn't changed that much since my PhD
2. And when it has changed, it is to become closer to Mathematical Optimization, something I am quite familiar with.
Let me expand a bit on these two items.
Machine Learning technology hasn't changed that much since 1990.
This may surprise you, because Machine Learning made the headlines since few years only. Something must have changed in the recent years.
The most visible change is that Machine Learning is now used in many industries, whereas it wasn't used outside academia in 1990. A great example of use is recommender systems like the one Netflix uses to recommend you movies to watch, or the ones used by e-commerce sites like Amazon to suggest products to buy. These recommender systems learn from how people decide to watch a movie or buy a product. Recommendations aren't manually coded by an army of developers.
What about Deep Learning then? This is probably the first thought that comes to mind when you read someone say machine learning technology hasn't changed that much in 25 years or so.
Truth is that most of the algorithms used in Deep Learning are as old as my PhD. Stoc
What have changed is the ability to scale, i.e. to train neural networks way more efficiently via the use of GPU. A second change is the availability of imagenet, a very large collection of annotated images. With faster neural networks and lots of examples to learn from, breakthrough image recognition became possible. Deep learning is now a vibrant topic, with innovations coming out every year, but let's not forget that the basics are not recent.
There are other major changes in Machine Learning technology that aren't as visible as Deep Learning, but that are worth mentioning. Let me pick two.
Factorization machines are the technology of choice for recommender systems. They build at the same time customer profiles, product profiles, and how these interact. One can then use these profiles to recommend products. I will blog soon about how this work given it is so neat. In the meantime, interested readers should read Steffen Rendle's paper and also the Fiel
Another significant advance is boosting applied to decision trees. Decision trees aren't a recent idea. Algorithms like CART and C4.5 were already around when I got my PhD. However, gradient boosting machine, or gradient boosted decision trees, are recent, see for instance the xgbo
This leads me to the second reason why I didn't need to learn Machine Learning from scratch again.
Machine Learning algorithms are optimization algorithms
Gradient boosted decision trees use some optimization algorithm to select the best possible ensemble of trees. Factorization machines also use some optimization algorithms to best approximate how users and products interact. Deep Learning uses an optimization algorithm like stochastic gradient descent to train models. As a matter of fact, most, if not all, state of the art machine learning algorithms are solving an optimization problem.
Basically, a machine learning algorithms selects among a family of possible models the model that minimizes the prediction error. A model is better if its prediction accuracy is better. Often we also insist to have a model as simple as possible, see Overfitting In Machine Learning for some examples. We therefore look for the model that minimizes a combination of prediction error and model simplicity.
A machine learning problem is then defined by selecting:
Then you can use your favorite optimization algorithm to solve it.
Let me give an example as this is a bit abstract.
Linear regression is probably the simplest and most widely used machine learning algorithm. By the way this exemplifies another major trend that occurred in the past 25 years: Machine Learning and Statistics are way closer now than they were 25 years ago. Indeed, Linear Regression is a statistical method. It is now seen as a Machine Learning Algorithm too. Some even say that Machine Learning should be called Statistical Machine Learning.
Back to our topic, the machine learning problem is to predict a quantity y from a set of quantities x, from a series of examples of the form
yi ≃ f(xi1, xi2, .. , xin)
where ≃ means approximately equal to.
Our family of models for f will be linear models, i.e. models where f is of the form:
f(x1, x2, .. , xn) = a0 + a1 x1 + a2 x2 + ... + an xn
The model is described by the parameters ai.
Our prediction error function will be the average of the square of error, i.e.:
sumi 1/m [ yi - (a0 + a1 xi1 + a2 xi2 + ... + an xin ) ] 2
where m is the number of examples.
Our model simplicity can be the sum of the absolute values of the parameters ai :
sumj | aj |
When using this model simplicity we get a variant of the Lasso algorithm. This algorithm finds the model that minimizes:
sumi 1/m [ yi - (a0 + a1 xi1 + a2 xi2 + ... + an xin ) ] 2 + sumj | aj |
This can be turned into an equivalent constrained quadratic minimization problem:
sumi 1/m [ yi - (a0 + a1 xi1 + a2 xi2 + ... + an xin ) ] 2 + sumj zj
aj ≤ zj
- aj ≤ zj
This problem is something readers familiar with mathematical optimization solvers can immediately relate to. However, given this optimization problem is mostly unconstrained and can be very large, algorithms available in mathematical optimization solvers may not be best. Machine learners prefer using local methods like Stoc
Many more machine learning algorithms can be described as optimization algorithms, see for instance thi
Well, I guess you now understand why I feel at home with Machine Learning as it is today: it is all about optimization.
Update on Nov 2, 2016. The above view is not just mine. For instance, Prof Diego Kuonen pointed me to how he covers the same topic in his "big
This is really close to what I presented.