No, The TSP Isn't NP Complete
Two recent blog posts discussing the Traveling Saleman Problem (TSP) led me to write this post. The two blog posts are What The mistake is to say that the TSP is NP complete. This is wrong as far as we know. Truth is that TSP is NP hard. Granted, this mistake is quite minor and technical (NP complete vs NP hard), and fixing it will not change the world. Nor does it mean that the TSP isn't a very exemplar problem, worth studying. But it is a mistake anyway. Let us define what we are discussing here. The TSP is a problem defined by a set of cities and the distances between each city pair. The problem is to find a circuit that goes through each city once and that ends where it starts. This in itself isn't difficult. What makes the problem interesting is to find the shortest circuit among all those that are possible. Solving this problem is quite simple. One merely need to compute the length of all possible circuits, then keep the shortest one. Issue is that the number of such circuits grows very quickly with the number of cities. If there are n cities then this number is factorial of n1 = (n1)(n2)...3.2 Indeed, we can select arbitrarily one city to start from (the starting point doesn't matter much given one must end at the same place). Then we have n1 different choices for the second city to be visited, n2 choices for the third city, and so on. For 16 cities there are more than a trillion circuits. For 100 cities there are approximately 10^{156} circuits. For 10,000 cities this number is about10^{35,657} There is no way one can compute the shortest circuit for 10,000 cities this way, even using the fastest available computing grid for a century. Yet, scientific advances made by OR researchers led to quite effective methods able to solve TSPs with large number of cities. For instance, the figure below shows the optimal solution for a tour of all 13,509 cities and towns in the US that have more than 500 residents. Figure credit: David Applegate, Robert Bixby, Vasek Chvatal and William Cook. Let us now define NP completeness. A problem is NP if one can easily (in polynomial time) check that a proposed solution is indeed a solution. A problem is NP hard if is is as difficult as any NP problem. A problem is NP complete if it is both NP and NP hard. Is the TSP NP complete or not? We must first check that the TSP is NP. For this we must be able to check that a proposed tour is a solution to the TSP. For this we merely have to check that each city is visited once. This can be done in polynomial time, hence the TSP is NP. Second, we must check that the TSP is NP hard. I won't explain it here, but this has been proved true. Being both NP and NP hard, the TSP is NP complete! Wait a minute, didn't I just make the same mistake as others? Here is the trick. In order to check that a proposed tour is a solution of the TSP we need to check two things, namely
We didn't check the second condition! The second condition is what makes the problem difficult to solve. As of today, no one has found a way to check condition 2 in polynomial time. It means that the TSP isn't in NP, as far as we know. Therefore, TSP isn't NP complete as far as we know. We can only say that TSP is NP hard.
Readers interested in learning more can read the following, or Rod Hilton's Trav A more formal point of view In all fairness, I don't think that computer science experts have a confused view. When they write that TSP is NP complete, they mean that the following decision problem (yes/no question) is NP complete: TSPDECISION : Given a number L, a set of cities, and distance between all city pairs, is there a tour visiting each city exactly once of length less than L? This problem is indeed NP complete, as it is easy (polynomial time) to check that a given tour leads to a yes answer to TSPDECISION.
I have elaborated on this mapping between NP complete decision problems and optimization problems in my NP O I am fine with the shortcut about TSP being NP complete if people understand that it is a statement about TSPDECISION. What I find wrong is when people think that the following decision problem is NP complete TSPMINDECISION : Given a tour T, a set of cities, and distance between all city pairs, is T visiting each city exactly once and is T of minimal length? Truth is that no one has found a polynomial time way to check that a tour is of minimal length. And if someone does it, then it would prove P = NP, which would be a tremendous achievement. Solving the TSP, i.e. finding a minimal length tour, is at least as hard as checking that a tour is of minimal length. Therefore it cannot be done in polynomial time as far as we know. Let me thank Matthew Saltzman who helped me nail down this argument (any error or misconception are mine). Update on January 2 Rewritten the end to address the most common criticism of this post. Update on January 3 Added link to Paul Rubin's post. Update on January 9: Lance Fortnow reacted in writing his own followup blog post..

As long as we are trying to nail down precise ideas here, I think you slightly misunderstand the distinction between decision and optimization problems. A decision problem poses a yesno question about a property of the instance, not about a solution. The form of problem is: Given an instance of the structure under investigation and a property of the structure, is the answer "yes" (the instance possesses the property or "no" (it doesn't)? If the answer is yes, a solution that exhibits the property is a "certificate". If the fact that the solution illustrates the property can be verified in polynomial time, the problem is in NP. So for the TSP, the decision formulation is: Given a graph G=(V,E) with weights c_j for j in E and a number L, is there a tour with total weight L or less. A set of edges is a certificate for a yes answer if it forms a tour and it has total edge weight L or less. Verifying that a set of edges forms a tour and computing its total weight can be done in polynomial time, so TSPDECISION is in NP.
Mathew,
You are right, NP applies to decision problems. Therefore, as you point out, when one states that the TSP is NP complete, one implictly maps the TSP to a decision problem.
Then,TSPMINDECISION is NP if and only if we can answer its question in polynomial time with a non deterministic Turing machine. Given no one has yet found a way to answer that question in polynomial time, this decision problem is not known to be NP.
We can now explain why TSP is not NP. It is because the equivalent decision problem is not in NP.
JeanFrancois, do we agree that an instance of a "decision problem" consists of some set of data and a yes/no question about that data? If so, then when you pose TSPMINIMUM as a decision problem, you need to say what data is provided and what the question is. As I read your post and your comment, the data that you provide for an instance of TSPMINDECISION is the graph, edge weights, and a tour, and the decision question is: Is the given tour of minimum length among tours?
So to your title point, TSPMINIMIZE in NPhard in the sense that if I had a polynomial algorithm for producing the minimum tour length, I'd have a polynomial algorithm for answering TSPDECISION (and CoTSPDECISION), but it's not (known to be) in NP, if you like, because it's not a decision problem with a yes/no answer. If you prefer, it's because there's not (known to be) a polynomial process that a nondeterministic Turing machine could carry out to produce the verified length of the shortest tour.
Your point was crystal clear.
Lance Fortnow pointed me to the Euclidian TSP that was "proven" to be NP complete. There are two seminal papers for this
TRAVELING SALESMAN PROBLEM: Given a set S of integer coordinate points in the plane and an integer L, does there exist a circuit passing through all the points of S which, with edge lengths l measured by L 1 (L2) has total length at most L?
In general, any optimization problem can be reformulated as a decision problem by moving the objective into the constraint set. (Matthew describes this nicely for TSP). If the derived decision problem is NPcomplete, then I think it is reasonable to refer to the original optimization problem as NPcomplete.
Pete,
One can read the statement "Optimization problem X is NPcomplete" either as shorthand for the (correct) statement "The decision form of problem X is NPcomplete" or as a misstatement of "Optimization problem X is NPhard." We who know the language generally give others who know the language the benefit of the doubt and assume they mean the former. I tend to lapse into that form in class after a while because the constant repetition of the phrase "the decision form of" gets tiresome.
JFP
Matthew, Pete, thank you for your comments. i've rewritten the end of my post to take some of them into account.