Hi, I'm trying to do a non-metric multidimensional scaling using isoMDS. However, I have some '0' distances in my data, and I'm not sure how to deal with them. I'd rather not drop rows from the original data, as I am comparing several datasets (morphology and molecular data) for the same individuals, and it's interesting to see how much morphological variation can be associated with an identical genotype. I've tried replacing the 0's with NA, but the isoMDS appears to stop on the first iteration and the stress does not improve: distA # A dist object with 13695 elements, 4 of which == 0 cmdsA <- cmdscale(distA, k=2) distB <- distA distB[which(distB==0)] <- NA isoA <- isoMDS(distB, cmdsA) initial value 21.835691 final value 21.835691 converged The other approach I've tried is replacing the 0's with small numbers. In this case isoMDS does reduce the stress values. min(distA[which(distA>0)]) [1] 0.02325581 distC <- distA distC[which(distC==0)] <- 0.001 isoC <- isoMDS(distC) initial value 21.682854 iter 5 value 16.862093 iter 10 value 16.451800 final value 16.339224 converged So my questions are: what am I doing wrong in the first example? Why does isoMDS converge without doing anything? Is replacing the 0's with small numbers an appropriate alternative? Thanks for your time, Tyler R 2.2.1
Short answer: you cannot compare distances including NAs, so there is no way to find a monotone mapping of distances. If the data really are identical for two rows, you can easily drop one of them whilst doing MDS, and then assign the position found for one to the other. On Tue, 18 Apr 2006, Tyler Smith wrote:> Hi, > > I'm trying to do a non-metric multidimensional scaling using isoMDS. > However, I have some '0' distances in my data, and I'm not sure how to > deal with them. I'd rather not drop rows from the original data, as I am > comparing several datasets (morphology and molecular data) for the same > individuals, and it's interesting to see how much morphological > variation can be associated with an identical genotype. > > I've tried replacing the 0's with NA, but the isoMDS appears to stop on > the first iteration and the stress does not improve: > > distA # A dist object with 13695 elements, 4 of which == 0 > cmdsA <- cmdscale(distA, k=2) > > distB <- distA > distB[which(distB==0)] <- NA > > isoA <- isoMDS(distB, cmdsA) > initial value 21.835691 > final value 21.835691 > converged > > The other approach I've tried is replacing the 0's with small numbers. > In this case isoMDS does reduce the stress values. > > min(distA[which(distA>0)]) > [1] 0.02325581 > > distC <- distA > distC[which(distC==0)] <- 0.001 > isoC <- isoMDS(distC) > initial value 21.682854 > iter 5 value 16.862093 > iter 10 value 16.451800 > final value 16.339224 > converged > > So my questions are: what am I doing wrong in the first example? Why > does isoMDS converge without doing anything? Is replacing the 0's with > small numbers an appropriate alternative? > > Thanks for your time, > > Tyler > R 2.2.1 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Tue, 2006-04-18 at 22:06 -0400, Tyler Smith wrote:> I'm trying to do a non-metric multidimensional scaling using isoMDS. > However, I have some '0' distances in my data, and I'm not sure how to > deal with them. I'd rather not drop rows from the original data, as I am > comparing several datasets (morphology and molecular data) for the same > individuals, and it's interesting to see how much morphological > variation can be associated with an identical genotype. > > I've tried replacing the 0's with NA, but the isoMDS appears to stop on > the first iteration and the stress does not improve: > > distA # A dist object with 13695 elements, 4 of which == 0 > cmdsA <- cmdscale(distA, k=2) > > distB <- distA > distB[which(distB==0)] <- NA > > isoA <- isoMDS(distB, cmdsA) > initial value 21.835691 > final value 21.835691 > converged > > The other approach I've tried is replacing the 0's with small numbers. > In this case isoMDS does reduce the stress values. > > min(distA[which(distA>0)]) > [1] 0.02325581 > > distC <- distA > distC[which(distC==0)] <- 0.001 > isoC <- isoMDS(distC) > initial value 21.682854 > iter 5 value 16.862093 > iter 10 value 16.451800 > final value 16.339224 > converged > > So my questions are: what am I doing wrong in the first example? Why > does isoMDS converge without doing anything? Is replacing the 0's with > small numbers an appropriate alternative? >Tyler, My experience is that isoMDS *may* fail to go away from the starting configuration if there are identical values in initial configuration, and this will happen if you use cmdscale() to get the initial configuration. You *may* get over this by shifting duplicates a bit:> con <- cmdscale(dis) > dups <- duplicated(con) > sum(dups)[1] 2> con[dups, ] <- con[dups,] + runif(2*sum(dups), -0.01, 0.01)Then isoMDS may go further. Another issue is that at a quick look isoMDS() seems to do nothing sensible with missing values, although it accepts them. The only thing is that they are ordered last, or regarded as very long distances (in your case they rather should be regarded as very short distances). The keylines in isoMDS are: ord <- order(dis) nd <- sum(!is.na(ord)) Even when 'dis' has missing values, the result of order() ('ord') has no missing values, but with default argument na.last=TRUE they are put last in the list. An obvious looking change would be to replace the second line with: nd <- sum(!is.na(dis)) but this "dumps the core" of R at least in my machine: probably you need the full length of vectors also in addition to number of non-missing entries. (This quick look was based on the latest release version of MASS/VR: there may be a newer version already with the upcoming R release, but that's not released yet.) You may check working with NA: are duplicate points identical in results? Then about replacing zero distances with a tiny number: this has been discussed before in this list, and Ripley said "no, no!". I do it all the time, but only in secrecy. A suggested solution was to drop duplicates, but then there still is a weighting issue, and isoMDS does not have weights argument. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/
About replacing the zeroes with tiny numbers: isoMDS works with the rankings of the distances. Therefore replacing zeroes by tiny values gives them a rank above the "real" zeroes (distance to same observation) and below all the non-zero distances. If this makes sense in your application (in my experience it usually does), you can do it. Sometimes the classical MDS solution is a local optimum of the isoMDS criterion. In these cases isoMDS "converges" in one step (rather it gives you the classical MDS solution). This may happen with and without zero or NA distances. Best, Christian On Tue, 18 Apr 2006, Tyler Smith wrote:> Hi, > > I'm trying to do a non-metric multidimensional scaling using isoMDS. > However, I have some '0' distances in my data, and I'm not sure how to > deal with them. I'd rather not drop rows from the original data, as I am > comparing several datasets (morphology and molecular data) for the same > individuals, and it's interesting to see how much morphological > variation can be associated with an identical genotype. > > I've tried replacing the 0's with NA, but the isoMDS appears to stop on > the first iteration and the stress does not improve: > > distA # A dist object with 13695 elements, 4 of which == 0 > cmdsA <- cmdscale(distA, k=2) > > distB <- distA > distB[which(distB==0)] <- NA > > isoA <- isoMDS(distB, cmdsA) > initial value 21.835691 > final value 21.835691 > converged > > The other approach I've tried is replacing the 0's with small numbers. > In this case isoMDS does reduce the stress values. > > min(distA[which(distA>0)]) > [1] 0.02325581 > > distC <- distA > distC[which(distC==0)] <- 0.001 > isoC <- isoMDS(distC) > initial value 21.682854 > iter 5 value 16.862093 > iter 10 value 16.451800 > final value 16.339224 > converged > > So my questions are: what am I doing wrong in the first example? Why > does isoMDS converge without doing anything? Is replacing the 0's with > small numbers an appropriate alternative? > > Thanks for your time, > > Tyler > R 2.2.1 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >*** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche