Computing The Really Optimal Tour Across The USA On The Cloud With Python
JeanFrancoisPuget 2700028FGP Comments (4) Visits (11814)
When Randy Olso
First reason to be surprised, the road trip computed by Randy Olson was not optimal, i.e. there is a shorter tour. The first to publish the shorter tour was Bill Cook in his Tour
Indeed, the problem is known as the Traveling Salesperson Problem (TSP). The goal is to find the shortest tour that visits a set of input locations. There are 50 locations, one for each mainland state in the USA, in the specific instance Randy selected. The TSP has been extensively studied in the mathematical optimization community for decades. Giants building on the shoulders of other giants resulted in the Concorde Solver, which is the best known way to solve TSP. For this article, what matters is that Concorde solves Randy Olson TSP in a tiny fraction of a second. Should you want to know more, the post by Bill Cook will tell you everything to be known on how to solve the TSP and its history.
Second reason to be surprised is the buzz Randy Olson's article created. How come such simple treatment of the TSP got so much coverage when solvers based on decades of work are available? The optimization community was quick to acknowledge that Randy's code was written very quickly, over a week end, and that it came with nice graphics, and reusable code. This led Nathan Brixius to explain in Hole
Well, I think that for the problems Randy selected, optimization tools are ready for mass consumption. I will use Python to prove it. Before that, let me say that Rob Pratt has shown in
Granted, the line count is a bit misleading because Rob reused a data file prepared by Bill Cook based on Randy Olson's code. His code does not display the results either. But this is an interesting step forward. The disputable point regarding consumption is that the software Rob used may not be available to all.
Let us see if Randy could have done a better job by leveraging Concorde, Python, and Google Maps. Moreover, in order to make our proof of concept more convincing, we do not want to install any additional software than Python on our machine, especially no commercial software. Can we do better than Randy, i.e. can we compute the shortest tour visiting each one of the 50 landmarks he has selected?
The answer is yes. I will be using these tools for my proof of concept:
Let's start with the crux of the problem, i.e. solve the TSP problem at hand. We will use the data file prepared by Bill Cook via Randy Olson's code which is available at: http
We will use the NEOS API for Concorde. It is not a REST API as is now customary for web services, because NEOS was developed before REST APIs were invented! NEOS API uses XML-RPC. All we need to provide is a XML file that contains the data of the problem plus few parameters for Concorde. In order to generate that xml file I could have read carefully the documentation. I preferred to use another nice way NEOS offers: it is possible to generate the xml file using the dry run option on the submission page. For that, I created a file named fake.tsp with the two character string '%s' as content. This content is only there for helping processing with Python later. I then entered the path to that file on the submission page, and I checked the dry run option before submitting. This does not run the job. It produces an xml file instead. I saved that xml file in a string called base_xml in my notebook.
All I had to do to be able to run a this TSP on NEOS was to replace the '%s' string by the actual content Bill cook prepared. I used the very powerful % Python operator that inserts in the left string the content of the right string:
with open(file_name, 'rb') as file: tsp_xml = base_xml % file.read()
Before calling that line I had to create a client for NEOS. After calling that line I had to wait until NEOS had processed the job before I could read and print the result. Note that NEOS processes jobs in the order they are submitted, therefore there may be some waiting time before our job gets processed.
The Python code for submitting the problem and printing the result is quite straightforward:
import time import xmlrpclibNEOS
Let's comment on the get_tour function in that code. The result we get in the msg string is a log of Concorde. This log must be parsed to map city numbers to the actual landmarks if we want to print the result in a meaningful way.
Let's have a closer look. The log contains some information about the problem and how it is solved, then it contains these lines
Optimal Solution: 22015038.00 Number of bbnodes: 1 Total Running Time: 0.04 (seconds)
This is pretty efficient isn't it? Granted, we should take into account network latency, and the fact that our job may be queued for a while on NEOS server. Even if this takes few seconds, it is much faster than Randy's code, for a better result.
Wait a minute, where is the result? It is in the remainder of the log, here is the beginning of it:
*** Cities are numbered 0..n-1 and each line shows a leg from one city to the next
This output must be parsed to map city numbers to the actual landmarks if we want to print the result in a meaningful way.
Our code computes an html display of the tour, using Google map api. It saves the result in a file called tsp.html that we can display in the notebook with the %%HTML magic function:
%%HTML <iframe height="500px" src="tsp.html" widt
This part is inspired by Randy's code but is much improved as it no longer requires manual editing thanks to the use of the jinja2 package. Indeed, with this package we can compute html content that is parameterized with some input. Here we pass the tour we computed as input, and the html is updated with the right tour information.
The tour we compute is this one.
What surprised me is that the tour I compute with the extended code has a length of 21,922.870 kilometers. It is shorter than the one computed by Bill Cook which has 22,015 kilometers ! Does this makes me eligible to the bounty he offered? Quoting him:
I offer a $10,000 reward to the first person who can find a better solution using our point-to-point distances.
Alas for me, the discrepancy does not come from a better tour. It is simply that when I called Google Maps I got different values than when Randy and Bill called it. Don't ask me why this is the case as I have no clue. When using the same distance matrix as Bill, my code finds this tour:
This tour is still different from the one Bill found! Good news is that it has the same length of 22,015 kilometers. In fact, the landmarks are visited in the same order, it is just that, for some reason, Google Maps selects a different route from points to points when I display it.
Let me conclude with some praise for NEOS. People may not realize how easy it is to use it. What is quite astounding is that NEOS was created way before SaaS became mainstream. These 3 papers provide background on NEOS.
Czyzyk, J., Mesnier, M. P., and Moré, J. J. 1998. The NEOS Server. IEEE Journal on Computational Science and Engineering 5(3), 68-75. This paper discusses the design and implementation of the NEOS Server.
Dolan, E. 2001. The NEOS Server 4.0 Administrative Guide. Technical Memorandum ANL/MCS-TM-250, Mathematics and Computer Science Division, Argonne National Laboratory. This technical report, which discusses the implementation of the server and its use in detail, is available for download in PDF format.
Gropp, W. and Moré, J. J. 1997. Optimization Environments and the NEOS Server. Approximation Theory and Optimization, M. D. Buhmann and A. Iserles, eds., Cambridge University Press, pages 167-182. This paper discusses the NEOS Server as a problem-solving environment that simplifies the formulation of optimization problems and the access to computational resources.
Update on April 9, 2015. Geoffrey De Smet also wrote an inte
I don't know if Randy Olson expected to generate so much interest from the optimization community, but he definitely led to several very interesting blog entries!
Update on March 12, 2016. Cleaned up the code a bit.