Dear List: I have a question regarding an MDS procedure that I am accustomed to using. I have searched around the archives a bit and the help doc and still need a little assistance. The package isoMDS is what I need to perform the non-metric scaling, but I am working with similarity matrices, not dissimilarities. The question may end up being resolved simply. Here is a bit of substantive background. I am working on a technique where individuals organize items based on how similar they perceive the items to be. For example, assume there are 10 items. Person 1 might group items 1,2,3,4,5 in group 1 and the others in group 2. I then turn this grouping into a binomial similarity matrix. The following is a sample matrix for Person 1 based on this hypothetical grouping. The off diagonals are the similar items with the 1's representing similarities. a b c d e f g h i j a 1 1 1 1 1 0 0 0 0 0 b 1 1 1 1 1 0 0 0 0 0 c 1 1 1 1 1 0 0 0 0 0 d 1 1 1 1 1 0 0 0 0 0 e 1 1 1 1 1 0 0 0 0 0 f 0 0 0 0 0 1 1 1 1 1 g 0 0 0 0 0 1 1 1 1 1 h 0 0 0 0 0 1 1 1 1 1 i 0 0 0 0 0 1 1 1 1 1 j 0 0 0 0 0 1 1 1 1 1 Each of these individual matrices are summed over individuals. So, in this summed matrix diagonal elements represent the total number of participants and the off-diagonals represent the number of times an item was viewed as being similar by members of the group (obviously the matrix is symmetric below the diagonal). So, a "4" in row 'a' column 'c' means that these items were viewed as being similar by 4 people. A sample total matrix is at the bottom of this email describing the perceived similarities of 10 items across 4 individuals. It is this total matrix that I end up working with in the MDS. I have previously worked in systat where I run the MDS and specify the matrix as a similarity matrix. I then take the resulting data from the MDS and perform a k-means cluster analysis to identify which items belong to a particular cluster, centroids, etc. So, here are my questions. 1) Can isoMDS work only with dissimilarities? Or, is there a way that it can perform the analysis on the similarity matrix as I have described it? 2) If I cannot perform the analysis on the similarity matrix, how can I turn this matrix into a dissimilarity matrix necessary? I am less familiar with this matrix and how it would be constructed? Thanks for any help offered, Harold a b c d e f g h i j a 4 2 4 3 3 2 0 0 0 0 b 2 4 2 3 1 0 2 2 2 2 c 4 2 4 3 3 2 0 0 0 0 d 3 3 3 4 2 1 1 1 1 1 e 3 1 3 2 4 3 1 1 1 1 f 2 0 2 1 3 4 2 2 2 2 g 0 2 0 1 1 2 4 4 4 4 h 0 2 0 1 1 2 4 4 4 4 i 0 2 0 1 1 2 4 4 4 4 j 0 2 0 1 1 2 4 4 4 4 [[alternative HTML version deleted]]
On Wed, 8 Sep 2004, Doran, Harold wrote:> 1) Can isoMDS work only with dissimilarities? Or, is there a way > that it can perform the analysis on the similarity matrix as I have > described it?Yes. The method, as well as the function in package MASS. All other MDS packages are doing a conversion, probably without telling you how.> 2) If I cannot perform the analysis on the similarity matrix, how > can I turn this matrix into a dissimilarity matrix necessary? I am less > familiar with this matrix and how it would be constructed?Normally similarities are in the range [0,1], and people use D = 1 - S or sqrt(1-S). (Which does not matter for isoMDS since it only uses ranks of dissimilarities, apart from finding the starting configuration.) See the references on the help page for isoMDS. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Distances cannot always be constructed from similarities. This can be done only if the matrix of similarities is nonnegative definite. With the nonnegative definite condition, and with the maximum similarity scaled so that s_ii=1, d_ik=(2*(1-s_ik))^-.5 Check out the vegan package. Alex -----Original Message----- From: Doran, Harold [mailto:HDoran at air.org] Sent: September 8, 2004 10:00 AM To: r-help at stat.math.ethz.ch Cc: Doran, Harold Subject: [R] isoMDS Dear List: I have a question regarding an MDS procedure that I am accustomed to using. I have searched around the archives a bit and the help doc and still need a little assistance. The package isoMDS is what I need to perform the non-metric scaling, but I am working with similarity matrices, not dissimilarities. The question may end up being resolved simply. Here is a bit of substantive background. I am working on a technique where individuals organize items based on how similar they perceive the items to be. For example, assume there are 10 items. Person 1 might group items 1,2,3,4,5 in group 1 and the others in group 2. I then turn this grouping into a binomial similarity matrix. The following is a sample matrix for Person 1 based on this hypothetical grouping. The off diagonals are the similar items with the 1's representing similarities. a b c d e f g h i j a 1 1 1 1 1 0 0 0 0 0 b 1 1 1 1 1 0 0 0 0 0 c 1 1 1 1 1 0 0 0 0 0 d 1 1 1 1 1 0 0 0 0 0 e 1 1 1 1 1 0 0 0 0 0 f 0 0 0 0 0 1 1 1 1 1 g 0 0 0 0 0 1 1 1 1 1 h 0 0 0 0 0 1 1 1 1 1 i 0 0 0 0 0 1 1 1 1 1 j 0 0 0 0 0 1 1 1 1 1 Each of these individual matrices are summed over individuals. So, in this summed matrix diagonal elements represent the total number of participants and the off-diagonals represent the number of times an item was viewed as being similar by members of the group (obviously the matrix is symmetric below the diagonal). So, a "4" in row 'a' column 'c' means that these items were viewed as being similar by 4 people. A sample total matrix is at the bottom of this email describing the perceived similarities of 10 items across 4 individuals. It is this total matrix that I end up working with in the MDS. I have previously worked in systat where I run the MDS and specify the matrix as a similarity matrix. I then take the resulting data from the MDS and perform a k-means cluster analysis to identify which items belong to a particular cluster, centroids, etc. So, here are my questions. 1) Can isoMDS work only with dissimilarities? Or, is there a way that it can perform the analysis on the similarity matrix as I have described it? 2) If I cannot perform the analysis on the similarity matrix, how can I turn this matrix into a dissimilarity matrix necessary? I am less familiar with this matrix and how it would be constructed? Thanks for any help offered, Harold a b c d e f g h i j a 4 2 4 3 3 2 0 0 0 0 b 2 4 2 3 1 0 2 2 2 2 c 4 2 4 3 3 2 0 0 0 0 d 3 3 3 4 2 1 1 1 1 1 e 3 1 3 2 4 3 1 1 1 1 f 2 0 2 1 3 4 2 2 2 2 g 0 2 0 1 1 2 4 4 4 4 h 0 2 0 1 1 2 4 4 4 4 i 0 2 0 1 1 2 4 4 4 4 j 0 2 0 1 1 2 4 4 4 4 [[alternative HTML version deleted]] ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Wed, 8 Sep 2004, Hanke, Alex wrote:> Distances cannot always be constructed from similarities. This can be done > only if the matrix of similarities is nonnegative definite. With the > nonnegative definite condition, and with the maximum similarity scaled so > that s_ii=1, d_ik=(2*(1-s_ik))^-.5But isoMDDS works with dissimilarities not distances. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
I don't understand. If isoMDS does not work with distances, why does the help for isoMDS indicate that the "Data are assumed to be dissimilarities or relative distances" ? Equally confusing is the loose use of the terms dissimilarities and distances in the literature. As you point out in your book "Distances are often called disimilarities". -----Original Message----- From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Sent: September 8, 2004 11:58 AM To: Hanke, Alex Cc: 'Doran, Harold'; 'r-help at stat.math.ethz.ch' Subject: RE: [R] isoMDS On Wed, 8 Sep 2004, Hanke, Alex wrote:> Distances cannot always be constructed from similarities. This can be done > only if the matrix of similarities is nonnegative definite. With the > nonnegative definite condition, and with the maximum similarity scaled so > that s_ii=1, d_ik=(2*(1-s_ik))^-.5But isoMDDS works with dissimilarities not distances. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thank you. Quick clarification. isoMDS only works with dissimilarities. Converting my similarity matrix into the dissimilarity matrix is done as (from an email I found on the archives)> d<- max(tt)-ttWhere tt is the similarity matrix. With this, I tried isoMDS as follows:> tt.mds<-isoMDS(d)and I get the following error message. Error in isoMDS(d) : An initial configuration must be supplied with NA/Infs in d. I was a little confused on exactly how to specify this initial config. So, from here I ran cmdscale on d as> d.mds<-cmdscale(d)which seemed to work fine and produce reasonable results. I was able to take the coordinates and run them through a k-means cluster and the results seemed to correctly match the grouping structure I created for this sample analysis. Cmdscale is for metric scaling, but it seemed to produce the results correctly. So, did I correctly convert the similarity matrix to the dissimilarity matrix? Second, should I have used cmdscale rather than isoMDS as I have done? Or, is there a way to specify the initial configuration that I have not done correctly. Again, many thanks. Harold -----Original Message----- From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Sent: Wednesday, September 08, 2004 9:58 AM To: Doran, Harold Cc: r-help at stat.math.ethz.ch Subject: Re: [R] isoMDS On Wed, 8 Sep 2004, Doran, Harold wrote:> 1) Can isoMDS work only with dissimilarities? Or, is there a way > that it can perform the analysis on the similarity matrix as I have > described it?Yes. The method, as well as the function in package MASS. All other MDS packages are doing a conversion, probably without telling you how.> 2) If I cannot perform the analysis on the similarity matrix, how > can I turn this matrix into a dissimilarity matrix necessary? I amless> familiar with this matrix and how it would be constructed?Normally similarities are in the range [0,1], and people use D = 1 - S or sqrt(1-S). (Which does not matter for isoMDS since it only uses ranks of dissimilarities, apart from finding the starting configuration.) See the references on the help page for isoMDS. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Doran, Harold wrote:>Thank you. Quick clarification. isoMDS only works with dissimilarities. >Converting my similarity matrix into the dissimilarity matrix is done as >(from an email I found on the archives) > > > >>d<- max(tt)-tt >> >> > > >Mardia, kent & Bibby defines the "standard transformation" from a similarity matrix to a dissimilarity (distance) matrix by d_rs <- sqrt( c_rr -2*c_rs + c_ss) where c_rs are the similarities. This assures the diagonal of the dissimilarity matrix to be zero. You could try that. Kjetil halvorsen>Where tt is the similarity matrix. With this, I tried isoMDS as follows: > > > >>tt.mds<-isoMDS(d) >> >> > >and I get the following error message. > >Error in isoMDS(d) : An initial configuration must be supplied with >NA/Infs in d. I was a little confused on exactly how to specify this >initial config. So, from here I ran cmdscale on d as > > > >>d.mds<-cmdscale(d) >> >> > >which seemed to work fine and produce reasonable results. I was able to >take the coordinates and run them through a k-means cluster and the >results seemed to correctly match the grouping structure I created for >this sample analysis. > >Cmdscale is for metric scaling, but it seemed to produce the results >correctly. > >So, did I correctly convert the similarity matrix to the dissimilarity >matrix? Second, should I have used cmdscale rather than isoMDS as I have >done? Or, is there a way to specify the initial configuration that I >have not done correctly. > >Again, many thanks. > >Harold > >-----Original Message----- >From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] >Sent: Wednesday, September 08, 2004 9:58 AM >To: Doran, Harold >Cc: r-help at stat.math.ethz.ch >Subject: Re: [R] isoMDS > >On Wed, 8 Sep 2004, Doran, Harold wrote: > > > >>1) Can isoMDS work only with dissimilarities? Or, is there a way >>that it can perform the analysis on the similarity matrix as I have >>described it? >> >> > >Yes. The method, as well as the function in package MASS. All other >MDS packages are doing a conversion, probably without telling you how. > > > >>2) If I cannot perform the analysis on the similarity matrix, how >>can I turn this matrix into a dissimilarity matrix necessary? I am >> >> >less > > >>familiar with this matrix and how it would be constructed? >> >> > >Normally similarities are in the range [0,1], and people use D = 1 - S >or >sqrt(1-S). (Which does not matter for isoMDS since it only uses ranks of >dissimilarities, apart from finding the starting configuration.) See >the >references on the help page for isoMDS. > > >-- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra
Thank you. I use the same matrix on cmdscale as I did with isoMDS. I have reproduced my steps below for clarification if this happens to shed any light. Here is the original total matrix (see opening thread if you care how this is created) a b c d e f g h a 4 4 2 4 1 2 0 0 b 4 4 2 4 1 2 0 0 c 2 2 4 2 3 2 2 1 d 4 4 2 4 1 2 0 0 e 1 1 3 1 4 3 3 2 f 2 2 2 2 3 4 2 1 g 0 0 2 0 3 2 4 3 h 0 0 1 0 2 1 3 4 So, there are 8 items. This matrix indicates that items 1,2, and 4 were always grouped together (or viewed as being similar by individuals). I transformed this using tt<-max(t)-t which results in a b c d e f g h a 0 0 2 0 3 2 4 4 b 0 0 2 0 3 2 4 4 c 2 2 0 2 1 2 2 3 d 0 0 2 0 3 2 4 4 e 3 3 1 3 0 1 1 2 f 2 2 2 2 1 0 2 3 g 4 4 2 4 1 2 0 1 h 4 4 3 4 2 3 1 0 When I run isoMDS on this new matrix, it tells me to specify the initial config because of the NA/INFs/ But when I perform cmdscale on this same matrix I end up with the following results,> bt<-cmdscale(tt);bt[,1] [,2] a -1.79268634 -0.2662750 b -1.79268634 -0.2662750 c -0.02635497 0.5798934 d -1.79268634 -0.2662750 e 1.08978620 0.6265313 f -0.02635497 0.5798934 g 2.20852966 0.2828937 h 2.13245309 -1.2703869 The results suggest that items 1,2, and 4 have similar locations as is expected. Also items 3 and 6 have similar locations as would also be expected. So, my results seem to have been replicated correctly using cmdscale. I've tried to specify an initial config using isoMDS in a few ways without success, so I am surely doing something wrong. So far, I have tried the following:> ll<-isoMDS(tt, y=cmdscale(tt))which tells me "zero or negative distance between objects 1 and 2"> ll<-isoMDS(tt, y=cmdscale(tt, k=2))Again, thanks, Harold -----Original Message----- From: Jari Oksanen [mailto:jarioksa@sun3.oulu.fi] Sent: Thu 9/9/2004 4:26 AM To: Doran, Harold Cc: Prof Brian Ripley; R-News Subject: RE: [R] isoMDS On Wed, 2004-09-08 at 21:31, Doran, Harold wrote: > Thank you. Quick clarification. isoMDS only works with dissimilarities. > Converting my similarity matrix into the dissimilarity matrix is done as > (from an email I found on the archives) > > > d<- max(tt)-tt > > Where tt is the similarity matrix. With this, I tried isoMDS as follows: > > > tt.mds<-isoMDS(d) > > and I get the following error message. > > Error in isoMDS(d) : An initial configuration must be supplied with > NA/Infs in d. I was a little confused on exactly how to specify this > initial config. So, from here I ran cmdscale on d as > This error message is quite informative: you have either missing or non-finite entries in your data. The only surprising thing here is that cmdscale works: it should fail, too. Are you sure that you haven't done anything with your data matrix in between, like changed it from matrix to a dist object? If the Inf/NaN/NA values are on the diagonal, they will magically disappear with as.dist. Anyway, if you're able to get a metric scaling result, you can manually feed that into isoMDS for the initial configuration, and avoid the check. See ?isoMDS. > > d.mds<-cmdscale(d) > > which seemed to work fine and produce reasonable results. I was able to > take the coordinates and run them through a k-means cluster and the > results seemed to correctly match the grouping structure I created for > this sample analysis. > > Cmdscale is for metric scaling, but it seemed to produce the results > correctly. > > So, did I correctly convert the similarity matrix to the dissimilarity > matrix? Second, should I have used cmdscale rather than isoMDS as I have > done? Or, is there a way to specify the initial configuration that I > have not done correctly. If you don't know whether you should use isoMDS or cmdscale, you probably should use cmdscale. If you know, things are different. Probably isoMDS gives you `better'(TM) results, but it is more complicated to handle. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email jari.oksanen@oulu.fi, homepage http://cc.oulu.fi/~jarioksa/ [[alternative HTML version deleted]]
I get the following message: Error in isoMDS(tt) : zero or negative distance between objects 1 and 2 This makes sense since a and b are identical in their relationship to c to h. Drop row 1 and col 1 and you get> isoMDS(tt[2:8,2:8])initial value 14.971992 iter 5 value 8.027815 iter 10 value 4.433377 iter 15 value 3.496364 iter 20 value 3.346726 final value 3.233738 converged $points [,1] [,2] [1,] -2.3143653 -0.1259226 [2,] -0.3205746 -1.1534662 [3,] -2.8641922 -0.1182906 [4,] 0.7753674 0.1497328 [5,] -0.5705552 1.2416843 [6,] 2.2305175 -0.6995917 [7,] 3.0638025 0.7058540 $stress [1] 3.233738 Does this help? -----Original Message----- From: Doran, Harold [mailto:HDoran at air.org] Sent: September 9, 2004 8:26 AM To: Jari Oksanen Cc: Doran, Harold; Prof Brian Ripley; R-News Subject: RE: [R] isoMDS Thank you. I use the same matrix on cmdscale as I did with isoMDS. I have reproduced my steps below for clarification if this happens to shed any light. Here is the original total matrix (see opening thread if you care how this is created) a b c d e f g h a 4 4 2 4 1 2 0 0 b 4 4 2 4 1 2 0 0 c 2 2 4 2 3 2 2 1 d 4 4 2 4 1 2 0 0 e 1 1 3 1 4 3 3 2 f 2 2 2 2 3 4 2 1 g 0 0 2 0 3 2 4 3 h 0 0 1 0 2 1 3 4 So, there are 8 items. This matrix indicates that items 1,2, and 4 were always grouped together (or viewed as being similar by individuals). I transformed this using tt<-max(t)-t which results in a b c d e f g h a 0 0 2 0 3 2 4 4 b 0 0 2 0 3 2 4 4 c 2 2 0 2 1 2 2 3 d 0 0 2 0 3 2 4 4 e 3 3 1 3 0 1 1 2 f 2 2 2 2 1 0 2 3 g 4 4 2 4 1 2 0 1 h 4 4 3 4 2 3 1 0 When I run isoMDS on this new matrix, it tells me to specify the initial config because of the NA/INFs/ But when I perform cmdscale on this same matrix I end up with the following results,> bt<-cmdscale(tt);bt[,1] [,2] a -1.79268634 -0.2662750 b -1.79268634 -0.2662750 c -0.02635497 0.5798934 d -1.79268634 -0.2662750 e 1.08978620 0.6265313 f -0.02635497 0.5798934 g 2.20852966 0.2828937 h 2.13245309 -1.2703869 The results suggest that items 1,2, and 4 have similar locations as is expected. Also items 3 and 6 have similar locations as would also be expected. So, my results seem to have been replicated correctly using cmdscale. I've tried to specify an initial config using isoMDS in a few ways without success, so I am surely doing something wrong. So far, I have tried the following:> ll<-isoMDS(tt, y=cmdscale(tt))which tells me "zero or negative distance between objects 1 and 2"> ll<-isoMDS(tt, y=cmdscale(tt, k=2))Again, thanks, Harold -----Original Message----- From: Jari Oksanen [mailto:jarioksa at sun3.oulu.fi] Sent: Thu 9/9/2004 4:26 AM To: Doran, Harold Cc: Prof Brian Ripley; R-News Subject: RE: [R] isoMDS On Wed, 2004-09-08 at 21:31, Doran, Harold wrote: > Thank you. Quick clarification. isoMDS only works with dissimilarities. > Converting my similarity matrix into the dissimilarity matrix is done as > (from an email I found on the archives) > > > d<- max(tt)-tt > > Where tt is the similarity matrix. With this, I tried isoMDS as follows: > > > tt.mds<-isoMDS(d) > > and I get the following error message. > > Error in isoMDS(d) : An initial configuration must be supplied with > NA/Infs in d. I was a little confused on exactly how to specify this > initial config. So, from here I ran cmdscale on d as > This error message is quite informative: you have either missing or non-finite entries in your data. The only surprising thing here is that cmdscale works: it should fail, too. Are you sure that you haven't done anything with your data matrix in between, like changed it from matrix to a dist object? If the Inf/NaN/NA values are on the diagonal, they will magically disappear with as.dist. Anyway, if you're able to get a metric scaling result, you can manually feed that into isoMDS for the initial configuration, and avoid the check. See ?isoMDS. > > d.mds<-cmdscale(d) > > which seemed to work fine and produce reasonable results. I was able to > take the coordinates and run them through a k-means cluster and the > results seemed to correctly match the grouping structure I created for > this sample analysis. > > Cmdscale is for metric scaling, but it seemed to produce the results > correctly. > > So, did I correctly convert the similarity matrix to the dissimilarity > matrix? Second, should I have used cmdscale rather than isoMDS as I have > done? Or, is there a way to specify the initial configuration that I > have not done correctly. If you don't know whether you should use isoMDS or cmdscale, you probably should use cmdscale. If you know, things are different. Probably isoMDS gives you `better'(TM) results, but it is more complicated to handle. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/ [[alternative HTML version deleted]] ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html