Raeanne Miller
2013-May-27 13:17 UTC
[R] metaMDS with large dataset produces 'insufficient data' warning
Greetings everyone, I am running MDS on a very large dataset (12 x 25071 - 12 model runs with 25071 output values each), and also on a very much reduced version of the dataset (randomly select 1000 of the 25071 output values). I would like to look at similarities/dissimilarities between the 12 model runs. When I use metaMDS on the full dataset, I get a warning message: Warning message: In metaMDS(MDSdata, distance = "bray", k = 2, autotransform = FALSE) : Stress is (nearly) zero - you may have insufficient data I don't think I have insufficient data, with 12 x 25071 data points, and when I reduce the dataset to only 1000 values per model run (so only 12 x 1000) I don't get this warning (though the final stress is now only just below 0.2 - my desired value). Is this warning because I have insufficient data? Or is it because of the nature of a large dataset? I can supply a dataset in .txt format by email, if that would be helpful. Thanks for your help, Raeanne The Scottish Association for Marine Science (SAMS) is registered in Scotland as a Company Limited by Guarantee (SC009292) and is a registered charity (9206). SAMS has an actively trading wholly owned subsidiary company: SAMS Research Services Ltd a Limited Company (SC224404). All Companies in the group are registered in Scotland and share a registered office at Scottish Marine Institute, Oban Argyll PA37 1QA. The content of this message may contain personal views which are not the views of SAMS unless specifically stated. Please note that all email traffic is monitored for purposes of security and spam filtering. As such individual emails may be examined in more detail. [[alternative HTML version deleted]]
Jari Oksanen
2013-May-28 13:43 UTC
[R] metaMDS with large dataset produces 'insufficient data' warning
Raeanne Miller <Raeanne.Miller <at> sams.ac.uk> writes:> > Greetings everyone, > > I am running MDS on a very large dataset (12 x 25071 - 12 model runs with25071 output values each), and also on a> very much reduced version of the dataset (randomly select 1000 of the25071 output values). I would like to> look at similarities/dissimilarities between the 12 model runs. When I usemetaMDS on the full dataset, I> get a warning message: > > Warning message: > In metaMDS(MDSdata, distance = "bray", k = 2, autotransform = FALSE) : > Stress is (nearly) zero - you may have insufficient data > > I don't think I have insufficient data, with 12 x 25071 data points, andwhen I reduce the dataset to only 1000> values per model run (so only 12 x 1000) I don't get this warning (thoughthe final stress is now only just> below 0.2 - my desired value). > > Is this warning because I have insufficient data? Or is it because of thenature of a large dataset?>Twelve points is not a large data set, but pretty small. Or that depends on how to interpret your message. It is the number of points that defines the data set size -- columns do not count. Further, it is a warning to alert you on possible problems. Everything may be OK, but you should have a look at the results. If it really is so that reducing the number of variables from 25071 to 1000 changes the results so that stress increases from 0 to 0.2, then you probably managed to remove some very influential variables from your data. It may be that there are only some few dominant variables that mostly define the dissimilarities and these give such a simple data structure that you get the warning when they are included. With default options, you get zero stress with six points, so that you should be on the safe side. Probably it is something funny in your data. Cheers, Jari Oksanen on possible problems. It is up to you see if there are problems or not.