## D-Wave vs CPLEX Comparison. Part 2: QUBOCan we improve CPLEX performance for yet another test series used in McGeoch&Wang paper, namely the native QUBO instances? According to the paper, these instances were random instances constructed to match what the D-Wave hardware can solve directly. These are problems of the form
minimize sum
where all the variables x
Such problems are called QUBO (Quadratic Unconstrained Binary Optimization ) problems. CPLEX release 12.3 didn't do very well according to the paper (although it was the best software). It was able to find all optimal solutions for the 600 instance of the test series, but it failed to complete the prof of optimality for 15 of them within a 30 minute time limit. As for the QAP test series I will not comment much on D-Wave machine or on other ways software could be used to solve the problems used to test CPLEX. Let me also say again that these problems are not CPLEX sweet spot. These are not the problems for which our customers use CPLEX as these are unconstrained. I'll refer to my discussion and to Mike
According to the paper, the coefficients Q The machine used in the paper is a D-Wave 2 machine with 439 nodes. The exact Chimera graph for that machine isn't published although it could be reconstructed from the problem instances. For the sake of clarity, here is the Chimera graph corresponding to an older D-Wave 1 machine, with 108 nodes. D-Wave company was kind enough to provide me with the exact instances used in the paper. Cathy McGeoch further helped me understand how these were generated. As a matter of fact I was provided Ising Model (IM). These models are similar to QUBOs but the variables take value in {-1, 1}
minimize sum
where all the variables s
The coefficients J These IM can easily be turned into QUBO using
s
where x
sum where
Q
Q
R = sum
This results in QUBO problems having random non diagonal elements (Q
As for - Use the latest CPLEX 12.5.1 release instead of the two years old 12.3 used in the paper
- Use parallelism. CPLEX was limited to using one thread in the paper.
- Use alternative ways of modeling the test problems
We will run the latest CPLEX release on a multi core machine for sure, but what about modeling? Are there smart ways to represent the problems to be tested? As commented in D-Wa
z
z
z
z
Note that the product x The release 12.5.1 of CPLEX makes this transformation automatically. However, I worked with the MIP model given I wanted to try various cuts for it. As I didn't found something simple and effective I stick to the MIP model. Problem instances are divided in 6 groups of increasing size of 32; 119; 184; 261; 349; 439 corresponding to sub graphs of the hardware graph. Here are the results obtained running CPLEX 12.5.1 on a 32 core machine running Linux. The cores are Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz. For each problem size we report the fastest solving time, the average solving time, and the longest solving time. The solving time is the time to find the optimal solution and prove it is optimal. All timings are given in seconds The mean is the shifted geometric mean: 1 second is added to all timings, then the geometric mean is taken, then a second is subtracted back. The one second shift is a way to remove noise as timing for small values isn't very reliable. We first ran CPLEX with one thread, as in the paper.
QUBO solving times with 1 thread Results are much better than those reported in the paper, as all problems can be solved to optimality within the time limit. Remember that in the paper 15 instances could get solved to optimality, meaning that they ran in more than 1800 seconds. Let us look at the 16th longest run here. It is 102 seconds. It means that we can now solve as many problems in 102 seconds as McGeoch and Wang could solve in 1800 seconds. We have almost a 18x speed up (1800/102). We then studied the effect of running CPLEX in parallel, with 8 and 32 threads respectively, ignoring the easiest models.
QUBO solving times with 8 threads We see that using 8 threads is a huge win as the worst case now takes sightly over a minute. The average speedup for the largest instances is almost 6x (55/9.58) compared to the single thread case. The 16th longest run now takes 17.4 seconds. It means we solve as many problems in 17.5 seconds now than using 1800 seconds in the paper. It is more than a 100x speedup (1800/17.4)
QUBO solving times with 32 threads Using 32 threads is helping further on the hardest models, but the average running time increases. It is consistent with our experience that increasing the number of threads pays off more for hard models. In this case the 16th longest run takes 12.4 seconds. This time, compared to the paper, the speedup is over 140x (1800/12.5). Another comparison was made in the paper, with a D-Wave answer provided in 0.491 ms. CPLEX was able to find the optimal solution in all instances in 1800 seconds (although it could not complete the optimality proof for 15 of them), hence the authors conclude that CPLEX can be up to 3666 (1800/0.491) times slower than D-Wave machine. What would we get with the results we report above? Note that we're comparing apple to oranges as we compare the time to find one feasible solution in the case of D-Wave against the time to find that solution and prove it is the optimal solution. Note that according to the paper, the D-wave solution happens to be the optimal one whenever CPLEX was able to prove optimality. However, D-Wave machine does not guarantee that it will always be the case. We should therefore compare D-Wave timing to the time it takes for CPLEX to find the optimal solution (and not count the time needed to prove it is optimal). CPLEX with default settings does not report the time at which the optimal solution is found, but it is obviously before completing the optimality proof. Looking at log files we estimate that CPLEX finds the optimal solution on these QUBO roughly in half the time it takes to complete the optimality proof. Anyway, let's use the total running time as a proxy here. With 32 threads, CPLEX worst case is 36.1 second, hence it is 74x (36.1/0.491) slower than D-Wave machine for these problems. On average CPLEX is about 20x (10/0.491) slower than D-Wave machine. Both are a significant improvement over the 3666 slow down reported in the paper. Let's recap what we achieved. We can find and prove optimality for all instances in about 10 seconds on average. This is to be compared to the 0.491 second it takes for D-Wave machine. Note that D-Wave machine does not prove optimality nor does it guarantee it. Under such circumstances, being only 20x slower than a dedicated hardware looks pretty good to me. Note that McGeoch and Wang also report on the quality of the solution found in 0.491 seconds. We haven't repeated that experiment as we find that setting time limits of under a second isn't very reliable on the machines we use.
We should not overestimate the significance of the above as running times depend heavily on how the QUBOs are generated. Our colleague Sanjeeb Dash from IBM research has studied random QUBOs, as opposed to QUBOs generated from random Ising Models (IM). Using a MIP model similar to ours he foun Understanding which QUBOs are difficult and which aren't is a research topic beyond the scope of the experiments I planned to do. Finding better ways to solve the hardest QUBOs with CPLEX is also a research project beyond this scope. I'm certain that these research projects will get traction if the class of problems that D-Wave machine can solve efficiently (these QUBOs) proves to be of commercial interest. I am also told by Geordie Rose of D-Wave that their next machine will be faster and will handle larger problem size. It will be interesting to revisit the above experiments when it will be out. The race between software and hardware will certainly keep on for a while here!
Cathy McGeoch provided me another instance set as the one I got before wasn't the exact same one as the one they used in the paper. The new instance set differs by one edge, ie one quadratic term in the objective, hence we expected similar results than the ones reported a above. We nevertheless ran the new instances. We also did something more to speedup CPLEX, which is to tune it. The suggestion was made to me by Sanjeeb Dash. Looking at the logs of CPLEX, the two types of cuts that were actively used are zero half cuts and Gomory fractional cuts. We therefore emphasize their use by using these two parameter settings
set mip cuts gom 2 I prefer not to tune CPLEX much in general hence did not consider tuning among the top three ways to improve running times. However, in this case, these parameter settings prove to be quite effective at reducing time variability while average running times stay similar. Results with these settings are given below on the new instance set. Compared to the above results, we see dramatic improvements in the longest running time.
QUBO solving times with 1 thread
QUBO solving times with 8 threads
QUBO solving times with 32 threads Let's look at the 16th worst running time on each of these runs. Remember that it was 1800s in the paper, meaning that 15 instances could not solve to optimality in 1800 seconds. For 1 thread it is now 43 seconds, meaning we can solve as many instances now in 43 seconds as in 1800 seconds in the paper. It yields a speedup of more than 40x (1800/43) compared to the paper. For 8 threads, the 16th worst running time it is now 12.8 seconds, yielding a speedup of 140x (1800/12.8) compared to the paper. For 32 threads the 16th worst running time is now 12 seconds, yielding a speedup of 150x (1800/12) compared to the paper. Looking at the other comparison, our worst case is now 17.6 seconds, which is about 36x (17.6/0.491) slower than D-Wave machine. When we use the average running time, CPLEX is about 20x (9.7/0.491) slower, which is quite good as commented above.
This is probably the last update on this entry, but certainly not the last word on D-Wave vs CPLEX! The main reason for the update is to report experiments using Alex Selby instances. We have also improved the timing on McGeoch&Wang instances via some (automatic) tweaking of the MIP models to be solved. Here are the latest results, running on the same machine as above.
QUBO solving times with 1 thread
QUBO solving times with 8 threads
QUBO solving times with 32 threads The speedup is significant compared to what we first reported two weeks ago. When using 32 threads we now have an average running time of 7 seconds vs 10 and a worst case of 11 seconds vs 36 for the largest set, The speedup is even more dramatic when we use a single thread, with average of 17.5 vs 55 and a worst case of 92 vs 933 seconds. As before, let's look at the 16th running time. Remember that it was 1800s in the paper, meaning that 15 instances could not solve to optimality in 1800 seconds. For 1 thread it is now 35 seconds, meaning we can solve as many instances now in 35 seconds as in 1800 seconds in the paper. It yields a speedup of more than 50x (1800/35) compared to the paper. For 8 threads, the 16th worst running time it is now 9 seconds, yielding a speedup of 200x (1800/9) compared to the paper. For 32 threads the 16th worst running time is now 8.5 seconds, yielding a speedup of 210x (1800/8.5) compared to the paper. Let's compare to D-Wave machine as well. Our worst case is now 11.3 seconds, which is about 23x (11.3/0.491) slower than D-Wave machine. CPLEX is about 14x (7.12/0.491) slower on average, which is pretty good given we compare to a specialized hardware. The above still compares apples to oranges as we compare a complete solver (CPLEX), to a (very good) heuristic solver (D-Wave). Fortunately there exists another complete solver for these QUBOs developed by Alex Selby. Alex also has developed a very efficient heuristic solver, but we'll stick to his complete solver here. Note that Alex' solver is taylored for the specific QUBOs to be solved. Indeed, it relies on the graph structure for solving the problem. Alex Selby also generated 3 sets of instances, with 1,000 instances each. Timing results for these instances are provided below, but let me first say that the optimal values found by Alex Selby's solver are the same as the ones computed by CPLEX for all 3,000 instances. Given the two solvers were developed independently, and given they use completely unrelated algorithms, this is a pretty good cross validation. The first set of instances aims at replicating the instances of McGeoch&Wang. They can be found here. They were created as follows. A 439 vertices sub graph of the 502 vertices Chimera graph is generated randomly, then a Ising model is generate with random weights in {-1, 1} for this graph. Let' look at the results. I report the results of Alex's prover as reported here, I also report the results using CPLEX with 32 threads as described above.
QUBO solving times for 439 vertices graph There are two things worth discussing here. First CPLEX is competitive with a specialized solver, being only 1.6x slower on average. And CPLEX is even faster for the worst case. The second thing is that CPLEX running times are smaller than for McGeoch instances, with an average of 4.8 vs 7.12 seconds. Therefore, Selby's instances are easier than those in the paper. It is probably due to the way the 439 vertices sub graph is created. For McGeoch it is the same graph for ll instances, while for Selby it may be different for each instance.
Alex then generated harder instances, using the same way to generate 439 vertices graphs. In this case, the Ising models that are generated are without external field, ie we set h
minimize sum
where all the variables s Once transformed into QUBOs we get
sum Q where
Q
Q
R = sum
QUBO solving times for 439 vertices graph with no external field We see that CPLEX is significantly slower than Selby's solver for these instances. However, we also see that in the worst case CPLEX takes 1315 seconds. This is much better than what we reported two weeks ago. Indeed, with our original setting, there was an instance taking 27,677 seconds among the first 300 we solved. The very same instance now takes 850 seconds, ie a speedup of 33x. The last set of instances are instances using the complete 502 vertices Chimera graph. They are available here.
QUBO solving times for 502 vertices graph This time CPLEX is significantly faster than Selby's solver. These experiments show that the difficulty of QUBO instances depends dramatically on how they are generated. They also show that CPLEX and Selby's solver are in the same ballpark, no one dominates the other. This is quite good for CPLEX as it is a very generic solver whereas Selby's is a specialized one. Let's recap. Over two weeks we made significant progress on how to use CPLEX for solving a set of problems (QUBOs) that were selected for benchmarking a specialized hardware from the D-Wave company. We are now only 14x slower than the specialized hardware, which is pretty good. We also improved dramatically over the results reported by McGeoch&Wang. Indeed, we can solve as mainly instances in 8.5 seconds than McGeoch&Wang could solve in 1800 seconds which is a speedup of more than 200x. This progress is due to a combination of better modeling, multi threading, better CPLEX release, and aggressive use of cuts. Last, we also compared CPLEX to a specialized software solver for these problems designed by Alex Selby. Results show that CPLEX is competitive with it.
New Weighted Max2SAT problems are reported in the D-Wa |