Machine Learning As Prescriptive Analytics
JeanFrancoisPuget 2700028FGP Comments (8) Visits (22958)
I made a mistake about machine learning.
I said, and I wrote, that machine learning and predictive analytics were almost the same.
To be more specific, my view was simple: analytics can be divided in four categories, exemplified below (see Analytics Landscape for details)
I put machine learning near predictive analytics in this 2D landscape:
Of course, I also put optimization as the queen of all analytics technologies as it yields best business value. What else would you expect from someone who spent nearly 3 decades in working in optimization? No wonder this view became popular in the optimization community...
If this is popular, then why is it wrong?
First, let me reassure readers about my mental health: I still think that optimization is best for computing optimal decisions. Issue is elsewhere.
I started thinking there was an issue when I met customers willing to use machine learning to solve all the business problems they have. I can't blame them, there is so much buzz around machine learning these days that it is hard to not think it can solve all problems there is. I can't blame them but I was a bit shocked when I saw people willing to use machine learning to compute best decisions. My mental model was that machine learning should be used to make predictions, and that optimization should be used to optimize decisions given these predictions:
Machine learning for predictions, optimization for decisions.
Let's use an example for the sake of clarity. We can use machine learning techniques to build a model that predict future demand (future sales levels) for a retail store chain. We can then use optimization to compute optimal inventory management for these stores, making sure cost of inventory and risk of out of shelves are kept at minimum.
Simple and powerful, isn't it?
Yet, I kept meeting people thinking of machine learning as a way to directly learn what the good decisions are. It made me think. Are they really wrong?
They may not be wrong after all. To see why, let's look at some facts.
One of the first known example of machine learning is Arthur Samuel 's work on a program that learned how to play checkers better than him. Samuel used machine learning to learn how to play, i.e. learn how to make the right decisions. There are other examples of machine learning used to learn how to make the right decisions. For instance, IBM Watson defeated best humans at Jeopardy few years ago. IBM Watson is a learning machine (we like to say a cognitive machine at IBM). It was given Wikipedia to digest, then it was trained with pairs of question answers from Jeopardy. As it trained, its performance at answering Jeopardy questions improved until it was able to best top human players. IBM Watson learned how to answer questions. It learned how to decide which answer was best given a question it had never seen before;
More recently, Google AlphaGo won a match against one of the top Go players. AlphaGo is also a learning machine. It was first trained on a large set of recorded Go games between top players. Then it trained against itself. As it trained, its performance at Go increased, until it became better than a top Human player. AlphaGo learned how to decide what the best next move is, for any Go board configuration.
Once we look at these examples, we no longer can say that machine learning is just for making predictions. Machine learning can be used to compute decisions.
Does it mean that machine learning is going to replace optimization? Let's zoom a little bit before answering that question.
When using optimization directly, someone, typically an operations research practitioner (a king of data scientist actually), creates an optimization model of the business problem to solve. For instance, that person will model all the inventory constraints (available space, cost per stored item, shelf life, cost of transportation, etc), as well as the business objective (combination of inventory cost and risk of lost sales because of out of shelf) Then, on a regular basis, this model is combined with a particular instance of the business problem to yield an optimization problem (for instance, inventory left from previous week, and predicted demand for the week). That problem is then solved by an optimization algorithm to yield a result, (for instance a replenishment plan). This can be depicted as follows.
When using machine learning, we start with a (large) set of instances of the business problem to solve, together with their known answers (yes, this is supervised learning). We then train a model on these examples, with the goal that this model can compute the best solution to new problems. Then the model is applied to new instances to compute the answer for that instance. Point is that the training phase is an optimization problem. It amounts to finding the model with best predictive power given the training examples; see the appendix below for more on this. We therefore get this workflow for machine learning:
The comparison with the previous picture is striking, isn't it? When we chose between optimization or machine learning to compute best decisions, we actually chose when to use optimization. Optimization will be used anyway.
I will let readers ponder about this.
Let me conclude with a question: why would we use optimization one way rather than the other way? Should we train a machine learning model on past examples of good and bad decisions? Or should we apply optimization to each new instance independently from each other? I don't have a clear generic answer yet, and I welcome suggestions from readers. My gut feeling is that the right answer could be a combination.
Appendix: Machine Learning Is An Optimization Problem
My opinion is the best machine learning work is an attempt to re-phrase prediction as an optimization problem (see for example: Bennett, K. P., & Parrado-Hernandez, E. (2006). The Interplay of Optimization and Machine Learning Research. Journal of Machine Learning Research, 7, 1265–1281). Good machine learning papers use good optimization techniques and bad machine learning papers (most of them in fact) use bad out of date ad-hoc optimization techniques.
I also found this definition of machine learning as an optimization problem when writing What Is Machine Learning? :