Jose Claudio Faria
2005-Jul-06 21:28 UTC
[R] Help: Mahalanobis distances between 'Species' from iris
Dear R list, I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data as obtained below: Squared Distance to Species From Species: Setosa Versicolor Virginica Setosa 0 89.86419 179.38471 Versicolor 89.86419 0 17.20107 Virginica 179.38471 17.20107 0 This distances above were obtained with proc 'CANDISC' of SAS, please, see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap21/sect19.htm From this distance my intention is to make a cluster analysis as below, using the package 'mclust': # # --- Begin R script --- # # For units compatibility of 'iris' from R dataset and 'iris' data used in # the SAS example: Measures = iris[,1:4]*10 Species = iris[,5] irisSAS = data.frame(Measures, Species) n = 3 Mah = c( 0, 89.86419, 0, 179.38471, 17.20107, 0) # My Question is: how to obtain 'Mah' with R from 'irisSAS' data? D = matrix(0, n, n) nam = c('Set', 'Ver', 'Vir') rownames(D) = nam colnames(D) = nam k = 0 for (i in 1:n) { for (j in 1:i) { k = k+1 D[i,j] = Mah[k] D[j,i] = Mah[k] } } D=sqrt(D) #D2 -> D library(mclust) dendroS = hclust(as.dist(D), method='single') dendroC = hclust(as.dist(D), method='complete') win.graph(w = 3.5, h = 6) split.screen(c(2, 1)) screen(1) plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue') screen(2) plot(dendroC, main='Complete', sub='', xlab='', col='red') # # --- End R script --- # I always need of this type of analysis and I'm not founding how to make it in the CRAN documentation (Archives, packages: mclust, cluster, fpc and mva). Regards, -- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: joseclaudio.faria at terra.com.br
Jose Claudio Faria
2005-Jul-11 00:47 UTC
[R] Help: Mahalanobis distances between 'Species' from iris
Dear Mulholland, I'm sorry, I was not clearly and objective sufficiently. Below you can see what I'm was trying to do: # D2Mah by JCFaria and Gabor Grothiendieck: 10/7/05 20:46:41 D2Mah = function(y, x) { stopifnot(is.data.frame(y), !missing(x)) stopifnot(dim(y)[1] != dim(x)[1]) y = as.matrix(y) x = as.factor(x) man = manova(y ~ x) E = summary(man)$SS[2] #Matrix E S = as.matrix(E$Residuals)/man$df.residual InvS = solve(S) mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y)) f = function(a,b) mapply(function(a,b) mahalanobis(mds[a,], mds[b,], InvS, TRUE), a, b) seq. = seq(length = nrow(mds)) names(seq.) = levels(x) D2 = outer(seq., seq., f) } # # test # D2M = D2Mah(iris[,1:4], iris[,5]) print(D2M) dend = hclust(as.dist(sqrt(D2M)), method='complete') plot(dend) So, Thanks for the reply. Best, JCFaria Mulholland, Tom wrote:> Why don't you use mahalanobis in the stats package. The function "Returns the Mahalanobis distance of all rows in 'x' and the vector mu='center' with respect to Sigma='cov'. This is (for vector 'x') defined as > > D^2 = (x - mu)' Sigma^{-1} (x - mu) > > I don't have any idea what this does but it appears to be talking about the same topic. If it is not suitable package fpc has related functions and adehabitat has a function for "Habitat Suitability Mapping with Mahalanobis Distances" > > Tom > > >>-----Original Message----- >>From: r-help-bounces at stat.math.ethz.ch >>[mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Jose >>Claudio Faria >>Sent: Thursday, 7 July 2005 5:29 AM >>To: r-help at stat.math.ethz.ch >>Subject: [R] Help: Mahalanobis distances between 'Species' from iris >> >> >>Dear R list, >> >>I'm trying to calculate Mahalanobis distances for 'Species' >>of 'iris' data >>as obtained below: >> >>Squared Distance to Species From Species: >> >> Setosa Versicolor Virginica >>Setosa 0 89.86419 179.38471 >>Versicolor 89.86419 0 17.20107 >>Virginica 179.38471 17.20107 0 >> >>This distances above were obtained with proc 'CANDISC' of SAS, please, >>see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from >>http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/ >>chap21/sect19.htm >> >> From this distance my intention is to make a cluster >>analysis as below, using >>the package 'mclust': >> >># >># --- Begin R script --- >># >># For units compatibility of 'iris' from R dataset and 'iris' >>data used in >># the SAS example: >>Measures = iris[,1:4]*10 >>Species = iris[,5] >>irisSAS = data.frame(Measures, Species) >> >>n = 3 >>Mah = c( 0, >> 89.86419, 0, >> 179.38471, 17.20107, 0) >> >># My Question is: how to obtain 'Mah' with R from 'irisSAS' data? >> >>D = matrix(0, n, n) >> >>nam = c('Set', 'Ver', 'Vir') >>rownames(D) = nam >>colnames(D) = nam >> >>k = 0 >>for (i in 1:n) { >> for (j in 1:i) { >> k = k+1 >> D[i,j] = Mah[k] >> D[j,i] = Mah[k] >> } >>} >> >>D=sqrt(D) #D2 -> D >> >>library(mclust) >>dendroS = hclust(as.dist(D), method='single') >>dendroC = hclust(as.dist(D), method='complete') >> >>win.graph(w = 3.5, h = 6) >>split.screen(c(2, 1)) >>screen(1) >>plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue') >> >>screen(2) >>plot(dendroC, main='Complete', sub='', xlab='', col='red') >># >># --- End R script --- >># >> >>I always need of this type of analysis and I'm not founding >>how to make it in >>the CRAN documentation (Archives, packages: mclust, cluster, >>fpc and mva). >> >>Regards, >>-- >>Jose Claudio Faria >>Brasil/Bahia/UESC/DCET >>Estatistica Experimental/Prof. Adjunto >>mails: >> joseclaudio.faria at terra.com.br >> >>______________________________________________ >>R-help at stat.math.ethz.ch mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide! >>http://www.R-project.org/posting-guide.html >>