Vinny Moriarty
2013-Dec-26 04:04 UTC
[R] Results from Vegan metaMDS varry depending on set.seed
I've got an ecological data set that I've worked up to the point of having a relative abundance matrix I created with the decostand() command in Vegan. Here is the distance matrix data: S1<-c(0.4451517, 0.37919827, 0.10590466, 0.06974540) S2<-c(0.5064846, 0.32464164, 0.09679181, 0.07208191) S3<-c(0.4481876, 0.26556447, 0.10486995, 0.18137797) S4<-c(0.5090950, 0.11474913, 0.17636805, 0.19978778) S5<-c(0.5996147, 0.05875069, 0.24532196, 0.09631260) S6<-c(0.7122068, 0.04365640, 0.14552133, 0.09861543) S7<-c(0.6490743, 0.06382979, 0.14396242, 0.14313346) S8<-c(0.5958636, 0.10774176, 0.16908888, 0.12730576) DF<-rbind(S1,S2,S3,S4,S5,S6,S7,S8) At first I was having issues with metaMDS producing two distinctly different NMDS plots at seemingly random intervals as I re-ran my analysis over multiple runs. I figured out it was because I was not using set.seedfor my metaMDS call. But now I am concerned that the seemingly small change of setting set.seed() has such a large impact on my analysis. As can be seen in the below oridplots, it looks to me like there is a change in relative distances between some of the latter 'sites'. set.seed(1) mds10<- metaMDS(DF, dist='bray') ordiplot(mds10,display='sites',type='text') vs. set.seed(999) mds10<- metaMDS(DF, dist='bray') ordiplot(mds10,display='sites',type='text') The difference between the two plots is large enough that it would change my interpretation of my analysis, so as this is my first foray into NMDS I am a bit concerned. Can someone tell me if this is just part of how NMDS or Vegan works (different local minimums)? Or does this imply a certain ambiguity about my data set? Or am I completely misreading the plots. If I add vector arrows for the 'species' influence like so: envfit10<-envfit(mds10, DF,perm=999) plot(mds10, display='sites',type='t') plot(envfit10) I can see that the two plots have different 'species' vectors, but it looks like the relative distance between S5 and some of the other sites changes between the two plots. Is one ordiplot more 'correct' than the other? If not, what am I to make of the difference between plots? [[alternative HTML version deleted]]
Jari Oksanen
2013-Dec-27 12:22 UTC
[R] Results from Vegan metaMDS varry depending on set.seed
Dear Vinny Moriarty, Vinny Moriarty <vwmoriarty <at> gmail.com> writes:> > I've got an ecological data set that I've worked up to the point of having > a relative abundance matrix I created with the decostand() command in Vegan. > > Here is the distance matrix data:<-- clip: 8 x 4 data matrix giving distance object of 28 elements -->> > At first I was having issues with metaMDS producing two distinctly > different NMDS plots at seemingly random intervals as I re-ran my analysis > over multiple runs. I figured out it was because I was not using > set.seedfor my metaMDS call. But now I am concerned that the seemingly > small change > of setting set.seed() has such a large impact on my analysis. >There are several issues here: 1) small changes in random seed should give completely different sequences. In fact, they should give as different sequences as with large changes in random seed.> As can be seen in the below oridplots, it looks to me like there is a > change in relative distances between some of the latter 'sites'. > > set.seed(1) > mds10<- metaMDS(DF, dist='bray') > > ordiplot(mds10,display='sites',type='text') > > vs. > > set.seed(999) > mds10<- metaMDS(DF, dist='bray') >2) set.seed(1) indeed seems to get trapped in the local minimum (which is equal to the initial solution based on PCoA).> ordiplot(mds10,display='sites',type='text') > > The difference between the two plots is large enough that it would change > my interpretation of my analysis, so as this is my first foray into NMDS I > am a bit concerned. Can someone tell me if this is just part of how NMDS or > Vegan works (different local minimums)? Or does this imply a certain > ambiguity about my data set? Or am I completely misreading the plots. >3) You should not make too firm conclusions based on data of 8 points. You need more data. I would not perform NMDS with 8 data points although it is technically possible.> If I add vector arrows for the 'species' influence like so: > > envfit10<-envfit(mds10, DF,perm=999) > plot(mds10, display='sites',type='t') > plot(envfit10) > > I can see that the two plots have different 'species' vectors, but it looks > like the relative distance between S5 and some of the other sites changes > between the two plots. > > Is one ordiplot more 'correct' than the other? If not, what am I to make of > the difference between plots?4) NMDS and metaMDS() return you a goodness of fit statistic called stress. Low stress is good (also in Christmas time). One of the solutions has lower stress, and in that sense it is more correct. It also seems that you do not get any random configurations, but all analyses end up in two alternative state. With set.seed(1) you get one state with higher stress, and with set.seed(999) you get another state with lower stress. This would indicate that set.seed(1) gives you a local minimum, and set.seed(999) possibly a global minimum. The best way of comparing solutions is to use procrustes() function of vegan which shows you that solutions go to either of these groups. However, with 8 points you should not push your analysis too far away to firm conclusions. Cheers, Jari Oksanen