Optimization Is Ready For Big Data: Part 2, Velocity
JeanFrancoisPuget 2700028FGP Visits (10520)
Proponents of Big Data boast about how it might help get personalized behavior from all the things and systems people interact with (web sites, mobile apps, customer support services, internet of things, etc) . These systems have to deal with data in motion such as web interaction, sensor feeds (eg body temperature), video, social media feeds, etc. Dealing with such data is the velocity dimension of Big Data. I have discussed how optimization could be applied to another big data dimension, namely large volume of data, in my previous post, and I will focus on velocity here.
Question of interest (to me at least) is whether current optimization technology can be relevant to making decisions in presence of data in motion? It means making decisions in almost real time, say within a second.
My answer is yes. It is yes despite the fact that many interesting and relevant optimization problems take more than a second to solve with current optimization technology. Let me explain why.
The first reason to be optimistic could be that there has been tremendous progress on the speed at which optimization solvers work. Problems that could take days if not weeks a decade ago can be solved in seconds. Is this speedup all we need to apply optimization in almost real time, say sub second time frame? Are we solving problems fast enough?
My take is that we are solving problems fast enough, but that's not what matters here. I explained why in Opti
They key issue is that making sub second decisions may lead to a myopic use of resources and therefore to major inefficiency.
Let's use an example for the sake of clarity. That example is a simplified view of what a customer of ours is doing. Assume we are running a taxi company (that was pre Uber world...) and that the decisions we need to make are to assign cars to incoming customer requests. We have two data streams: car positions, and customer requests. What most companies do is to assign the closest car each time a customer calls. Our company is smarter: it waits a bit before assigning cars, which leads to lower waiting time for customers, hence better service.
The one dimensional world example below shows the difference between the two approaches. Assume we have two idle cars waiting for customers, and that customers call in. Assigning closest car to each call is depicted in the left. Waiting a bit before assigning cars leads to a better way, depicted on the right.
The key point is that there is a fundamental contradiction between making real time decisions, which means one at a time, and making several decisions together, which results in an optimized use of resources. We should therefore wait before making decisions. Issue it is not always possible to wait before answering. For instance, replace taxis with ambulances. No one will understand that we wait until we have several accidents before sending vehicles. Does it mean that we should forget about optimization in that case, and use the myopic way of making decisions one at a time?
Not necessarily. One way to leverage optimization is to anticipate, i.e. preposition vehicles (taxis, ambulances) so that we minimize the expected travel time to where they are needed. Note that both police departments and fire departments have a similar problem: how could they position their vehicles and patrols so that the time to intervention is minimized?
My colleagues at IBM have worked on many projects around vehicle prepositioning, see for instance the video below.
In general, a two step process has been followed:
We can summarize the process graphically as follows.
The good news is that a simple rule of thumb such as "use the closest available car" gives good results once vehicles properly are prepositioned, The myopic effect we had in the taxi example above is gone.
This example shows that we can use optimization to pre assign resources, in order to make real time decisions more effective. Let us look at another way, which is to forecast beyond the next decision.
This example was discussed in
The process can be depicted graphically as follows.
This look quite similar to the previous picture, isn't it? The general pattern is a two step process
This is a rather flexible pattern that we have seen regularly. We can depict it as follows:
They key point is that we have been able to circumvent the contradiction between real time decision making, which favors making decisions one at a time, and optimization, that is relevant only when many decisions are made at the same time. One way to resolve that dilemma is use optimization to preassign resources so that a simple real time allocation such as use the closest car yields decent results. Another way is to forecast what will happen beyond the next decision, in order to make several decisions at once. In both cases, optimization makes real time decisions more effective.