A Nice Optimization Problem From Santa ClausKaggle is a site that is most known for hosting machine learning competitions. However, once a year, Kaggle team runs an optimization competition on some problem Santa Claus could face. This year competition is a stochastic optimization problem: we are asked to optimize some outcome when the data is known with some uncertainty. Many real word problems are of this form. For instance, optimizing store replenishment and inventory levels takes as input sales forecasts. By definition, future sales are only known up to some prediction uncertainty. In a case like this, one can optimize for the worst case for instance: find a replenishment plan that minimizes the likelihood of out of stock. I could go an and expand on this, but let's go back to Kaggle competition for now. The ProblemThis year comp ♫ Bells are ringing, children singing, all is merry and bright. Santa's elves made a big mistake, now he needs your help tonight ♫ All was well in Santa's workshop. The gifts were made, the route was planned, the naughty and nice list complete. Santa thought this would finally be the year he didn't need Kaggle's help with his combinatorial conundrums. At last, the Claus family could take the elves and reindeer on that well deserved vacation to the South Pole. Then, with just days until the big night, Santa received an email from a panicked database admin elf. Attached was a server log with the six least jolly words a jolly old St. Nick could read: ALTER TABLE Gifts DROP COLUMN Weight One of the North Pole elf interns had mistakenly deleted the weights for all of the inventory in the workshop! Santa didn't have a backup (remember, this is a guy who makes a list and checks it twice) and, without knowing each present's weight, he didn't know how he would safely pack his many gift bags. Gifts were already on their way to the sleigh packing facility and there wasn't time to reweigh all the presents. It was once again necessary to summon the holiday talents of Kaggle's elite. Can you help Santa fill his multiple bags with sets of uncertain gifts? Save the season by turning Santa's unce The data section contains additional information: Santa has 1000 bags to fill to fill with 9 types of gifts. Due to regulations at the North Pole workshop, no bag can contain more than 50 pounds of gifts. If a bag is overweight, it is confiscated by regulators from the North Pole Department of Labor without warning! Even Santa has to worry about throwing out his bad back. Each present has a fixed weight, but the individual weights are unknown. The weights for each present type are not identical because the elves make them in many types and sizes. It then provides a way to compute the weight distributions of each gift type. Details are available on Kaggle site, just follow the above link. It is definitely a stochastic optimization problem: we are asked to optimize the weights of the gifts Santa can distribute, when these weights are only known via a probability distributions. I decided to approach this as a stochastic cutt The ModelOne way to approach the competition is to look for a solution structure that has a good chance to yield good submission. A solution structure is defined by a number of bag types, plus a number of occurrence of each bag type. A bag type is defined by the number of gifts of each type it contains. For instance 3 blocks and 1 train. We can focus on bag types because all bags have the same capacity (50 pounds). There is a finite number of bag types that are possible. We define one random variables for each bag type. All we need is an estimate the expected value and the variance of each possible bag type. Then we use two properties to find a combination of bags that maximizes a combination of expected value and standard deviation:
We estimate the mean and variance of each bag type via the law of large numbers: we use Monte Carlo simulation (with 1M sample), and compute mean and variance of the simulated results. While simple, this approach is way too expensive to run. We improved running time in two ways:
Let me expand a bit on the Pareto frontier idea. Let's consider two bags for the sake of clarity:
The second bag is obtained by adding one gift to the first bag. We can compute the expected weight for each of these bags. If the expected weight is lower for the second bag than for the first bag, then the second bag can be ignored. Why? Because it uses more gifts for a lower value. More generally, if the first bag cannot be extended with an additional gift without lowering the expected value then it is Pareto optimal. Given this, we start with an empty bag, and add one gift at a time in every possible way. We do it until the expected value of the bag decreases. When this happens then we can discard the newly created bag, as it uses more items and yields a lower expected value. This results in about 40,000 bag types. Optimizing Expected ValueNext step is to solve the optimization problem. As said before, it is a cutting stock problem. The mathematical formulation is as follows. where
This defines a mixed integer problem with linear constraints (MIP). Solving it is pretty straightforward with a state of the art MIP solver like CPLEX. I used the recent DOcplex package to call it from Python. The code is close to the above mathematical formulation. from docplex.mp.model import Model def mip_ bags is a data frame containing all bag types. We return the portion of it that contains only the bag types that are used. Solving this MIP yields solution structures with expected value around 35,540 pounds. Results depend on how we estimate the expected value of each bag type. The more simulation runs, the more accurate it is, but the more time it takes to generate all bag types. I thought finding the optimal expected value would be good enough to win the competition, but I was really wrong. My first submission scored about 35,880 pounds, and as I write now, top score is close to 37,000 pounds. How could that happen? Isn't my solution the optimal one? It is, but in a probabilistic way: it is the best one one average. Issue is that the competition isn't about finding the best solution on average. The goal is to find the best solution given the actual (hidden) weights of the gifts. Optimizing Mean and VarianceOne way to improve result is to generate many solutions from the same solution structure, albeit using different gifts. For instance, if the solution structure contains one bag made of one train and three blocs, a first solution could include this bag: [train_1, blocks_3, blocks_8, blocks_12]. In a second solution, the same bag could be [train_1, blocks_3, blocks_8, blocks_12]. The expected value for both bags is the same, but the actual values will be different, because the weights of individual gifts are different: the weight for train_1 is not the same as the weight of train_2. Given we can generate many solutions from a given solution structure, how could we improve the value of the best possible one? One way to do it is to favor solution structure with larger variance. If two solution structures have the same expected value, then the one with the larger variance is more likely to generate larger value submissions. It is also more likely to generate lower value submissions. Question is how to do that with a solver like CPLEX? Well, the standard deviation of the solution structure is the root of its variance. And its variance is the sum of the variance of all bags in it. The mathematical formulation is therefore a slight extension of the previous one: where
This problem contains a quadratic constraint. It is a quadratically constrained mixed integer problem (QCMIP). Again, solving it with CPLEX is rather easy, the code becomes: def qcpm Solving this QCMIP with alpha=2 yields solution structures with expected value around 35,525 pounds, and standard deviation around 333 pounds. The expected value is a bit lower, but the standard deviation is much larger. With solution structure like this, I had about 1/4 chance to get a submission above 36,400 pounds by the end of competition (about 90 submissions left at that time). This looked much better but truth is that people have found ways to generate way higher value, as shown by the current leader board of the competition. I am also able to generate better solutions, but don't count on me to disclose how before competition ends There is a caveat in the second approach. Can you see it? The caveat is about how we prune the generation of candidate bags. We need to modify it to take into account the new objective function we have. When we generate bag2 by adding one gift to bag1 then we should compare mean(bag1) + alpha * std(bag1) with mean(bag2) + alpha * std(bag2). If the former is higher than the latter then we can safely drop bag2 from further consideration. TakeawayWhat I found interesting is to use a QCMIP to optimize the maximum of the values we can get when generating solutions from one solution structure. This is significantly different from usual stochastic optimization problems. Indeed, in my experience, people are usually interested in finding solutions that are good on average (like in my first model), or that minimize worst case scenario. Here we are asked to maximize the best case scenario. It is very unusual.
