thr3ads.net - R help - [R] isoMDS and 0 distances [Apr 2006]

If this information is useful, please help other people find it:
Share via:

Tyler Smith

2006-Apr-19 02:06 UTC

[R] isoMDS and 0 distances

Hi,

I'm trying to do a non-metric multidimensional scaling using isoMDS. 
However, I have some '0' distances in my data, and I'm not sure how
to
deal with them. I'd rather not drop rows from the original data, as I am 
comparing several datasets (morphology and molecular data) for the same 
individuals, and it's interesting to see how much morphological 
variation can be associated with an identical genotype.

I've tried replacing the 0's with NA, but the isoMDS appears to stop on 
the first iteration and the stress does not improve:

distA # A dist object with 13695 elements, 4 of which == 0
cmdsA <- cmdscale(distA, k=2)

distB <- distA
distB[which(distB==0)] <- NA

isoA <- isoMDS(distB, cmdsA)
initial  value 21.835691
final  value 21.835691
converged

The other approach I've tried is replacing the 0's with small numbers. 
In this case isoMDS does reduce the stress values.

min(distA[which(distA>0)])
[1] 0.02325581

distC <- distA
distC[which(distC==0)] <- 0.001
isoC <- isoMDS(distC)
initial  value 21.682854
iter   5 value 16.862093
iter  10 value 16.451800
final  value 16.339224
converged

So my questions are: what am I doing wrong in the first example? Why 
does isoMDS converge without doing anything? Is replacing the 0's with 
small numbers an appropriate alternative?

Thanks for your time,

Tyler
R 2.2.1

Prof Brian Ripley

2006-Apr-19 06:46 UTC

head link

[R] isoMDS and 0 distances

Short answer: you cannot compare distances including NAs, so there is no 
way to find a monotone mapping of distances.

If the data really are identical for two rows, you can easily drop one of 
them whilst doing MDS, and then assign the position found for one to the 
other.

On Tue, 18 Apr 2006, Tyler Smith wrote:
> Hi,
>
> I'm trying to do a non-metric multidimensional scaling using isoMDS.
> However, I have some '0' distances in my data, and I'm not sure
how to
> deal with them. I'd rather not drop rows from the original data, as I
am
> comparing several datasets (morphology and molecular data) for the same
> individuals, and it's interesting to see how much morphological
> variation can be associated with an identical genotype.
>
> I've tried replacing the 0's with NA, but the isoMDS appears to
stop on
> the first iteration and the stress does not improve:
>
> distA # A dist object with 13695 elements, 4 of which == 0
> cmdsA <- cmdscale(distA, k=2)
>
> distB <- distA
> distB[which(distB==0)] <- NA
>
> isoA <- isoMDS(distB, cmdsA)
> initial  value 21.835691
> final  value 21.835691
> converged
>
> The other approach I've tried is replacing the 0's with small
numbers.
> In this case isoMDS does reduce the stress values.
>
> min(distA[which(distA>0)])
> [1] 0.02325581
>
> distC <- distA
> distC[which(distC==0)] <- 0.001
> isoC <- isoMDS(distC)
> initial  value 21.682854
> iter   5 value 16.862093
> iter  10 value 16.451800
> final  value 16.339224
> converged
>
> So my questions are: what am I doing wrong in the first example? Why
> does isoMDS converge without doing anything? Is replacing the 0's with
> small numbers an appropriate alternative?
>
> Thanks for your time,
>
> Tyler
> R 2.2.1
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Jari Oksanen

2006-Apr-19 07:23 UTC

head link

[R] isoMDS and 0 distances

On Tue, 2006-04-18 at 22:06 -0400, Tyler Smith wrote:
> I'm trying to do a non-metric multidimensional scaling using isoMDS. 
> However, I have some '0' distances in my data, and I'm not sure
how to
> deal with them. I'd rather not drop rows from the original data, as I
am
> comparing several datasets (morphology and molecular data) for the same 
> individuals, and it's interesting to see how much morphological 
> variation can be associated with an identical genotype.
> 
> I've tried replacing the 0's with NA, but the isoMDS appears to
stop on
> the first iteration and the stress does not improve:
> 
> distA # A dist object with 13695 elements, 4 of which == 0
> cmdsA <- cmdscale(distA, k=2)
> 
> distB <- distA
> distB[which(distB==0)] <- NA
> 
> isoA <- isoMDS(distB, cmdsA)
> initial  value 21.835691
> final  value 21.835691
> converged
> 
> The other approach I've tried is replacing the 0's with small
numbers.
> In this case isoMDS does reduce the stress values.
> 
> min(distA[which(distA>0)])
> [1] 0.02325581
> 
> distC <- distA
> distC[which(distC==0)] <- 0.001
> isoC <- isoMDS(distC)
> initial  value 21.682854
> iter   5 value 16.862093
> iter  10 value 16.451800
> final  value 16.339224
> converged
> 
> So my questions are: what am I doing wrong in the first example? Why 
> does isoMDS converge without doing anything? Is replacing the 0's with 
> small numbers an appropriate alternative?
> Tyler,

My experience is that isoMDS *may* fail to go away from the starting
configuration if there are identical values in initial configuration,
and this will happen if you use cmdscale() to get the initial
configuration. You *may* get over this by shifting duplicates a bit:
> con <- cmdscale(dis)
> dups <- duplicated(con)
> sum(dups)
[1] 2> con[dups, ] <- con[dups,] + runif(2*sum(dups), -0.01, 0.01)
Then isoMDS may go further.

Another issue is that at a quick look isoMDS() seems to do nothing
sensible with missing values, although it accepts them. The only thing
is that they are ordered last, or regarded as very long distances (in
your case they rather should be regarded as very short distances). The
keylines in isoMDS are:

    ord <- order(dis)
    nd <- sum(!is.na(ord))

Even when 'dis' has missing values,  the result of order()
('ord') has
no missing values, but with default argument na.last=TRUE they are put
last in the list. An obvious looking change would be to replace the
second line with:

    nd <- sum(!is.na(dis))

but this "dumps the core" of R at least in my machine: probably you
need
the full length of vectors also in addition to number of non-missing
entries. (This quick look was based on the latest release version of
MASS/VR: there may be a newer version already with the upcoming R
release, but that's not released yet.)

You may check working with NA: are duplicate points identical in
results?

Then about replacing zero distances with a tiny number: this has been
discussed before in this list, and Ripley said "no, no!". I do it all
the time, but only in secrecy. A suggested solution was to drop
duplicates, but then there still is a weighting issue, and isoMDS does
not have weights argument.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/

Christian Hennig

2006-Apr-19 10:43 UTC

head link

[R] isoMDS and 0 distances

About replacing the zeroes with tiny numbers:
isoMDS works with the rankings of the distances. Therefore replacing 
zeroes by tiny values gives them a rank above the "real" zeroes
(distance
to same observation) and below all the non-zero distances. If this makes 
sense in your application (in my experience it usually does), you can do 
it.

Sometimes the classical MDS solution is a local optimum of the isoMDS 
criterion. In these cases isoMDS "converges" in one step (rather it
gives
you the classical MDS solution). This may happen with and without zero 
or NA distances.

Best,
Christian

On Tue, 18 Apr 2006, Tyler Smith wrote:
> Hi,
>
> I'm trying to do a non-metric multidimensional scaling using isoMDS.
> However, I have some '0' distances in my data, and I'm not sure
how to
> deal with them. I'd rather not drop rows from the original data, as I
am
> comparing several datasets (morphology and molecular data) for the same
> individuals, and it's interesting to see how much morphological
> variation can be associated with an identical genotype.
>
> I've tried replacing the 0's with NA, but the isoMDS appears to
stop on
> the first iteration and the stress does not improve:
>
> distA # A dist object with 13695 elements, 4 of which == 0
> cmdsA <- cmdscale(distA, k=2)
>
> distB <- distA
> distB[which(distB==0)] <- NA
>
> isoA <- isoMDS(distB, cmdsA)
> initial  value 21.835691
> final  value 21.835691
> converged
>
> The other approach I've tried is replacing the 0's with small
numbers.
> In this case isoMDS does reduce the stress values.
>
> min(distA[which(distA>0)])
> [1] 0.02325581
>
> distC <- distA
> distC[which(distC==0)] <- 0.001
> isoC <- isoMDS(distC)
> initial  value 21.682854
> iter   5 value 16.862093
> iter  10 value 16.451800
> final  value 16.339224
> converged
>
> So my questions are: what am I doing wrong in the first example? Why
> does isoMDS converge without doing anything? Is replacing the 0's with
> small numbers an appropriate alternative?
>
> Thanks for your time,
>
> Tyler
> R 2.2.1
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

Possibly Parallel Threads

Search for more reasonably related threads

R help - Apr 2006 - isoMDS and 0 distances

[R] isoMDS and 0 distances

[R] isoMDS and 0 distances

[R] isoMDS and 0 distances

[R] isoMDS and 0 distances

Possibly Parallel Threads