Joris Meys
2010-May-25 17:00 UTC
[R] Calculation time of isoMDS and the optimal number of dimensions
Dear all, I'm running a set of nonparametric MDS analyses, using a wrapper for isoMDS, on a 800x800 distance matrix. I noticed that setting the parameter k to larger numbers seriously increases the calculation time. Actually, with k=10 it calculates already longer than for k=2 and k=5 together. It's now calculating for 6 hours, and counting... There is quite a difference between the results using k=2 or k=5 when looking at the first 2 dimensions (logically...). I suspect the same when k=10. Yet, I start asking myself whether this makes sense if I'm only using the first 2 dimensions. And I can't think of a formal method to check in a nMDS framework how much dimensions are enough. Anybody an idea? I use metaMDS from the vegan package, although it's not really meant to be used on these data. Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 Joris.Meys@Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Michael Denslow
2010-May-26 03:32 UTC
[R] Calculation time of isoMDS and the optimal number of dimensions
Hi Joris, On Tue, May 25, 2010 at 1:00 PM, Joris Meys <jorismeys at gmail.com> wrote:> Dear all, > > I'm running a set of nonparametric MDS analyses, using a wrapper for isoMDS, > on a 800x800 distance matrix. I noticed that setting the parameter k to > larger numbers seriously increases the calculation time. Actually, with k=10 > it calculates already longer than for k=2 and k=5 together. It's now > calculating for 6 hours, and counting...Seems like a long time, I have a 100x100 matrix that takes about 40 secs to run with k=10. What is the wrapper function doing?> > There is quite a difference between the results using k=2 or k=5 when > looking at the first 2 dimensions (logically...). I suspect the same when > k=10. Yet, I start asking myself whether this makes sense if I'm only using > the first 2 dimensions. And I can't think of a formal method to check in a > nMDS framework how much dimensions are enough. Anybody an idea?You might want to look at the nmds.min() function in the ecodist package, which seeks to minimize stress. Out of curiosity, do you often use 10 dimensional solutions in your field of study? Hope this helps, Michael> I use metaMDS from the vegan package, although it's not really meant to be > used on these data. > > Cheers > Joris > > -- > Joris Meys > Statistical Consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > Coupure Links 653 > B-9000 Gent > > tel : +32 9 264 59 87 > Joris.Meys at Ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael Denslow I.W. Carpenter Jr. Herbarium [BOON] Department of Biology Appalachian State University Boone, North Carolina U.S.A. -- AND -- Communications Manager Southeast Regional Network of Expertise and Collections sernec.org 36.214177, -81.681480 +/- 3103 meters
Gavin Simpson
2010-May-26 07:34 UTC
[R] Calculation time of isoMDS and the optimal number of dimensions
On Tue, 2010-05-25 at 19:00 +0200, Joris Meys wrote:> Dear all, > > I'm running a set of nonparametric MDS analyses, using a wrapper for isoMDS, > on a 800x800 distance matrix. I noticed that setting the parameter k to > larger numbers seriously increases the calculation time. Actually, with k=10 > it calculates already longer than for k=2 and k=5 together. It's now > calculating for 6 hours, and counting...metaMDS will try 'trymax' random starts of isoMDS in an attempt to see if convergent solutions are reached. The 10d computation is clearly much more complex than fitting rank distances in 2 or even 5 d.> There is quite a difference between the results using k=2 or k=5 when > looking at the first 2 dimensions (logically...). I suspect the same when > k=10. Yet, I start asking myself whether this makes sense if I'm only using > the first 2 dimensions. And I can't think of a formal method to check in a > nMDS framework how much dimensions are enough. Anybody an idea?In nMDS the configuration counts, not the axes (as they are themselves arbitrary directions --- having one or the other of a x or y geographical coordinate isn't much use without the other coordinate if you want to find your way to that location - you need both). It makes no sense what so ever to compute a 10d nMDS solution if you only want a 2d solution for later computations; there is no guarantee that the first two "axes" of a 10d nMDS solution will be as good as those from the 2d solution. If you only want a 2d solution, concentrate on finding the best 2d solution you can using metaMDS.> I use metaMDS from the vegan package, although it's not really meant to be > used on these data.Why do you say that? As long as you turn off a couple of the "ecological" helper bits in metaMDS, all it is doing is handling random starts of the isoMDS algorithm.> > Cheers > Joris >HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%