Isomap, Dijkstra and those pesky Neighbours

I’m in the process of writing an implementation of Isomap for use in ImagePRV.  When I say in the process of what I actually mean is that I’ve written it, but, it isn’t the most efficient code ever. In fact to reduce a 4 dimensional 2000 point data set would take about a day (which is better than the first stab, which would have taken ~30 days).  So where’s the lag?  Well Isomap is split into three stages:

1. Find the k-Nearest Neighbours of every data point in the data set.
2. Compute the shortest distance between all datapoints using Dijkstra’s algorithm (producing a distance graph Dg)
3. Perform classical MDS on Dg.

The final stage doesn’t take too long, so it’s the first 2 that are causing all the trouble.  kNN is a slow process, I optimised it by putting in a quick sort but it still takes a while.  I might look into other ways of performing a kNN search (rather than computing the distances between the datapoint we’re looking at and all other points, sorting this list and taking the first k entries).  For some reason it’s Dijkstra’s algorithm that’s taking the longest time, and I’m afraid I think it’s a problem with Java.  This same problem put into MATLAB calls a C++ class that performs Dijkstra in no time, so why is the Java version so slow?  I only have a limited amount of time to find out as I need to get ImagePRV up and running so that we can run some tests and write up the findings for this year’s MIUA conference.  A well written version of Dijkstra should be O(nlogn), at the moment the algorithm seems to be in the region of O(n^2).

Remember all this is for a trivial 4D 2000 point data set.  What’s going to happen when I try it on an 80,000 point 56D data set?  Find out in 2 years time.


About this entry