## NP Or Not NP? That Is The QuestionA recent blog entry on TSP and NP completeness made me write the long overdue entry I wanted to write about complexity of optimization problems. It comes in play when customers ask this simple question: My problem takes too long to solve, what can I do?I'm pretty sure most optimization professionals heard this question at least once. I already blogged about it in my It Is Too Slow entry without actually answering it (clever isn't it?) Here are various ways to answer it depending on your own agenda. - As an employee of one of the largest hardware manufacturer, IBM, I should say : "Simple. Come and buy the latest version of our flagship hardware product XX, it will help you solve your problem."
- A consultant would probably prefer to say that there may be better ways to express the problem, and that she can provide a better mathematical model for it
- A computer science person would start explaining why it is hopeless given the problem is NP complete
All
these answers have some merit, and are partly true. The purpose of this
post is to revisit the last one. Note that the middle response is
worth looking at as well. Indeed, it is often very effective to think
hard on how to map a given business problem to a mathematical
optimization problem. I won't elaborate on this here, and point to the
many existing OR resources for further reading. I'll come back briefly to the first answer later on. Let's go back to the customer question. It often comes from cases where some business problem
is solved fast enough therefore customer wants to solve a larger
problem. The real question is therefore about guessing the time it will
take to solve a problem larger than the ones we currently solve. Said
differently, it is about how running times grow with the size of the
problem to be solved. For instance, when going from one week planning
to four week planning, we increase problem size four times. How will
running time grow? Will it be by a factor of four as well? Or will it
be by a factor of one billion? - A set of decision variables
*V*. Variables can take floating point values, or integer values - A set of constraints
*C*on these variables - An objective function
*f*. Without loss of generality we assume that we are maximizing the objective function
A feasible solution to an optimization problem as defined above is a set of values, one a value for each variable, such that all constraints are met. The If the original optimization problem is NPO then the decision problem is NP. It simply means that one can check in polynomial time that a proposed assignment of values to variables is a feasible solution of value value of feasible solution s is the value of the objective function f(s) for that feasible solution. Solving an optimization problem amounts to finding a feasible solution s such that there are no other feasible solution with a higher value than f(s). In that case we say that s is an optimal solution to the problem. Note that there can be more than one optimal solution in some cases, see for instance this entry from Paul Rubin.Depending on the nature of the variables (floating point vs integer,) the nature of the constraints, and the nature of the objective function, solving the problem can be relatively easy or can be extremely hard. Mixed integer programming (MIP) is a popular class of mathematical optimization problems. In these problems some variables must take integer values, all constraints are linear constraints (weighted sums of variables less than a given constant), and the objective function is linear (weighted sum of variables). MIP problems can be hard to solve, but it is rather easy to check if a proposed assignment of values to variable is a feasible solution. More precisely, MIP problems have three interesting properties. Note that the size of a MIP problem is given by the number of variables, the number of constraints, and the number of weights in the constraints and the objective function. - The size of a feasible solution is polynomial in the size of the problem. Indeed, for MIP a feasible solution is given by a value for each of the decision variables, which is linear in the size of the problem.
- The fact that a proposed assignment of values is a feasible solution can be checked in polynomial time. Indeed, for MIP it is sufficient to replace each variable by its value and to check the validity of each constraint of the problem.
- The cost of a feasible solution (what we
minimize or maximize) can be computed in polynomial time. Indeed, for MIP it amounts to compute the value of the weighted sum of variable values.
Problems having these properties are Nondeterministic Polynomial Optimization (NPO) problems. They are not NP-complete problems but they are related to them. Indeed, given an optimization problem and a value k, we can define a decision problem as follows:*P(k):*Is there a feasible solution*s*such that*f(s) = k*?
k . This follows directly from the properties of NPO problems. It so happens that the decision problem for MIP is as hard as any other NP problem. This property is called NP complete.
Therefore, saying that MIP is NP complete isn't exactly true, but it is
close to be true: the associated decision problem is NP complete.Note that solving a NPO problem amounts to solve at least one decision problem, namely P(f*) where f* is the value of the optimal solution. Therefore, solving an NPO problem is at least as hard as solving the corresponding decision problem. Solving a MIP problem is then as hard as the decision problem for MIP, which is itself NP complete. It means that MIP is at least as hard as a NP complete problem. This property is called NP hard. It does not mean that MIP is NP complete, as explained above.As a summary: MIP is NPO, MIP is NP hard, and the decision problem for MIP is NP complete. What does it mean in practice? Well, most computer science people think that NP complete problems are hard to solve in the worst case. More precisely, most people think that there is no polynomial time algorithm for NP complete problems. It means also that most people think there is no polynomial time algorithm for NPO problems whose decision problems are NP complete. If all these people are right, then it means that in the worst case a MIP problem can take a long long time to be solved. Basically, it means that in the worst case one has to look for all possible assignment of values to decision variables in order to select the best feasible one. The number of such assignment grows exponentially (at least) with the number of variables. It means that adding a single variable can double the running time. Doubling the size of a problem may result in incredibly longer running times. For these problems, buying a faster computer isn't the first thing to try! Fortunately for us and our customers, all the discussion above is about worst case. In practice, large MIP can be solved in reasonable time, and only a very tiny part of all the possible assignment of values of variables need to be considered. Moreover, customers aren't always interested in finding the optimal solution and a proof that it is optimal. What they want are good solutions. One can stop MIP algorithm before completion, which can yield good feasible solutions in a reasonable time. More on this can be found in What Is The Gap? Another interesting class of problem is Linear Programming (LP). These are similar to MIP, same constraints and objective function. the only difference is that all variables can take floating point values. These problems are also NPO problems, but the associated decision problem aren't NP complete. They can be solved in polynomial time. It means that running time will grow in a reasonable way, for instance it will be four times slower to solve a problem twice as large. For such problems, buying a faster computer may make sense!There is much more to say about the complexity of optimization problem. For instance we did not discuss approximation algorithms. We refer interested readers to this good introductory presentation. Other interesting read are Paul Rudin on P vs NP relevance to optimization practice, and Shiva Ram's Ekthetikophobia: The Fear of the Exponential Edited on Thursday 15th November 2012. I've added the piece about NP hard that was missing in the first version. Edited on Sunday 18th November 2012. I've added link to Paul Rubin's post on practical relevance of NP vs P. |

1msoos commented PermalinkNice post! I think theoretical departments at universities have been pushing this "NP=impossible" into the heads of students a bit too much, and so we get well-educated people thinking "this is known to be impossible" and laypeople thinking "these guys went mad thinking it's impossible". It would be good to not only teach the theory, but give large real-world examples where it is possible (e.g. scheduling), and small real-world examples where it's essentially impossible (e.g. crypto).

2JeanFrancoisPuget commented PermalinkThank you. I agree.

3PaulRubin commented PermalinkThanks for the link, and for a well-wrritten post on a sometimes confusing (and, to me, often annoying) subject

4MGwynne commented PermalinkI find it interesting how people are willing to try to figure out whether their program is correct (and hence likely terminates), or to search for proofs to various non-trivial propositions - yet in general these problems are undecidable. In comparison problems in NP are easy - we will at least *eventually* find a solution!

5JeanFrancoisPuget commented Permalink@MGwynne, good point. There is a whole zoo of problem classes more complex than NP indeed.

6PaulRubin commented PermalinkJean Francois: I think your employer has a spam filter specifically targeted to my ID. I started my previous comment with a compliment about your post (very nice job explaining NP-ness), but apparently that wasn't enough. The rest I think I'll put in a post on my own blog; I'm overdue for one, and I have an opinion about NP issues that might be too long-winded for a comment anyway.

7JeanFrancoisPuget commented PermalinkPaul,

8JeanFrancoisPuget commented PermalinkTo be crystal clear on complexity, we should take into account the number of bits used to represent the numbers involved in the statement of a MIP or LP. The size of a problem is the actual number of bits used to represent it. For instance we could use the memory consumption as a size measure. However, given we work with fixed size numbers (machine integers and machine floating points, usually 64 bits per number), then the size is proportional to the number of numbers. It is why I used the latter s problem size in this post.

1