## What's New In Machine Learning?What has changed in Machine Learning in the past 25 years? You may not care about this question. You may even not realize that Machine Learning as a technical and scientific field is older than 25 years. But I do care about this question. I care because I got a PhD in Machine Learning in 1990. I then moved sidetrack to work on constraint programming and mathematical optimization. I am back to machine learning since a couple of years, and I did ask myself this: is my PhD still relevant, or has Machine Learning changed so much that I need to learn it again from scratch?
To cut a long story short: yes my PhD isn't relevant anymore. The topics I was working on, namely symbolic machine learning, and inductive logic programming, are not very popular these days. Does it mean I needed to learn Machine Learning from scratch? Fortunately for me, the answer to that question was no. Why? Because of two reasons: 1. Machine Learning technology hasn't changed that much since my PhD 2. And when it has changed, it is to become closer to Mathematical Optimization, something I am quite familiar with. Let me expand a bit on these two items. ## Machine Learning technology hasn't changed that much since 1990.This may surprise you, because Machine Learning made the headlines since few years only. Something must have changed in the recent years. The most visible change is that Machine Learning is now used in many industries, whereas it wasn't used outside academia in 1990. A great example of use is recommender systems like the one Netflix uses to recommend you movies to watch, or the ones used by e-commerce sites like Amazon to suggest products to buy. These recommender systems learn from how people decide to watch a movie or buy a product. Recommendations aren't manually coded by an army of developers. What about Truth is that most of the algorithms used in Deep Learning are as old as my PhD. Stoc What have changed is the ability to scale, i.e. to train neural networks way more efficiently via the use of GPU. A second change is the availability of imagenet, a very large collection of annotated images. With faster neural networks and lots of examples to learn from, breakthrough image recognition became possible. Deep learning is now a vibrant topic, with innovations coming out every year, but let's not forget that the basics are not recent. There are other major changes in Machine Learning technology that aren't as visible as Deep Learning, but that are worth mentioning. Let me pick two.
Another significant advance is boosting applied to decision trees. Decision trees aren't a recent idea. Algorithms like CART and C4.5 were already around when I got my PhD. However, gradient boosting machine, or gradient boosted decision trees, are recent, see for instance the xgbo
This leads me to the second reason why I didn't need to learn Machine Learning from scratch again. ## Machine Learning algorithms are optimization algorithmsGradient boosted decision trees use some optimization algorithm to select the best possible ensemble of trees. Factorization machines also use some optimization algorithms to best approximate how users and products interact. Deep Learning uses an optimization algorithm like stochastic gradient descent to train models. As a matter of fact, most, if not all, state of the art machine learning algorithms are solving an optimization problem. Basically, a machine learning algorithms selects among a family of possible models the model that minimizes the prediction error. A model is better if its prediction accuracy is better. Often we also insist to have a model as simple as possible, see Overfitting In Machine Learning for some examples. We therefore look for the model that minimizes a combination of prediction error and model simplicity. A machine learning problem is then defined by selecting: - a family of models,
- a definition of the prediction error
- a definition of model complexity
Then you can use your favorite optimization algorithm to solve it. Let me give an example as this is a bit abstract. Linear regression is probably the simplest and most widely used machine learning algorithm. By the way this exemplifies another major trend that occurred in the past 25 years: Machine Learning and Statistics are way closer now than they were 25 years ago. Indeed, Linear Regression is a statistical method. It is now seen as a Machine Learning Algorithm too. Some even say that Machine Learning should be called Back to our topic, the machine learning problem is to predict a quantity
where ≃ means Our family of models for
The model is described by the parameters
Our prediction error function will be the average of the square of error, i.e.:
where
Our model simplicity can be the sum of the absolute values of the parameters
When using this model simplicity we get a variant of the Lasso algorithm. This algorithm finds the model that minimizes:
This can be turned into an equivalent constrained quadratic minimization problem:
This problem is something readers familiar with mathematical optimization solvers can immediately relate to. However, given this optimization problem is mostly unconstrained and can be very large, algorithms available in mathematical optimization solvers may not be best. Machine learners prefer using local methods like Stoc Many more machine learning algorithms can be described as optimization algorithms, see for instance thi ## Bottom LineWell, I guess you now understand why I feel at home with Machine Learning as it is today: it is all about optimization.
This is really close to what I presented. |