I have been very involved lately in Data Science competitions at Kaggle. It gives me a chance to put my Data Science training to use and use it as a proving grounds for new ideas. I am currently ranked within the top 2% of all participants on Kaggle.

The current competition I am working on is the Traveling Santa Problem. At its heart, this is a Traveling Salesman Problem. There is a unique twist however. Instead of just finding an optimal Hamiltonian Cycle, instead you must find two disjoint Hamiltonian Paths. Your score is the greater of the two. You can read the complete information about the competition over at Kaggle.

My weapon of choice is Java. Although there are many libraries which deal with undirected graphs in Java, and some that specifically deal with TSP’s, I decided to write all of the Java code myself. Over time, this became a toolkit of sorts, allowing me to import paths, export paths, generate tours, etc. I wrote my own Nearest Neighbor method, 2-opt method, and other local optimization algorithms.

I use R to visualize my data. The libraries in R are much more straightforward for doing such things. The visualizations proved quite helpful. In an earlier version of my 2-opt algorithm, I was immediately able to see inefficiencies which allowed me to re-write the algorithm, or more correctly, fix bugs I had in my version of the algorithm, with much improved results.

I also found simple methods such as a sketch pad and pencil invaluable. I know that sounds a bit archaic, but sometimes the easiest visualizations are some of the most effective.

The competition still has a few weeks left, and there are still many opportunities to optimize my code. I am also considering using methods such as a Genetic Algorithm, Simulated Annealing or Delaunay Triangulation to help me find an improved solution.. The issue with both of these, is that the competition only has so much time left, and I have gone far enough down the NN/2-opt path that I have a considerable investment in my code base. At the time of this writing I am in the #37 spot, but my initial trials of some of my improved algorithms show that I should be able to push a few more positions in the next day or two. After that? Well, that is the beauty of these competitions, they force you to push the limits of your skills to find better methods to optimize the solution, I definitely have not run out of ideas yet.

The image in this post is an example of an optimized tour. It is 150,000 vertices connected by a distance of 5979822.