Isomap, Dijkstra and those pesky Neighbours
I’m in the process of writing an implementation of Isomap for use in ImagePRV. When I say in the process of what I actually mean is that I’ve written it, but, it isn’t the most efficient code ever. In fact to reduce a 4 dimensional 2000 point data set would take about a day (which is better than the first stab, which would have taken ~30 days). So where’s the lag? Well Isomap is split into three stages:
1. Find the k-Nearest Neighbours of every data point in the data set.
2. Compute the shortest distance between all datapoints using Dijkstra’s algorithm (producing a distance graph Dg)
3. Perform classical MDS on Dg.
The final stage doesn’t take too long, so it’s the first 2 that are causing all the trouble. kNN is a slow process, I optimised it by putting in a quick sort but it still takes a while. I might look into other ways of performing a kNN search (rather than computing the distances between the datapoint we’re looking at and all other points, sorting this list and taking the first k entries). For some reason it’s Dijkstra’s algorithm that’s taking the longest time, and I’m afraid I think it’s a problem with Java. This same problem put into MATLAB calls a C++ class that performs Dijkstra in no time, so why is the Java version so slow? I only have a limited amount of time to find out as I need to get ImagePRV up and running so that we can run some tests and write up the findings for this year’s MIUA conference. A well written version of Dijkstra should be O(nlogn), at the moment the algorithm seems to be in the region of O(n^2).
Remember all this is for a trivial 4D 2000 point data set. What’s going to happen when I try it on an 80,000 point 56D data set? Find out in 2 years time.
About this entry
You’re currently reading “Isomap, Dijkstra and those pesky Neighbours,” an entry on The Moon is Down
- Published:
- 02.29.08 / 9am
- Category:
- Programming, Computers, Uni
No comments
Jump to comment form | comments rss [?] | trackback uri [?]