Jose Claudio Faria
2005-Jul-06 21:28 UTC
[R] Help: Mahalanobis distances between 'Species' from iris
Dear R list,
I'm trying to calculate Mahalanobis distances for 'Species' of
'iris' data
as obtained below:
Squared Distance to Species From Species:
Setosa Versicolor Virginica
Setosa 0 89.86419 179.38471
Versicolor 89.86419 0 17.20107
Virginica 179.38471 17.20107 0
This distances above were obtained with proc 'CANDISC' of SAS, please,
see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from
http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap21/sect19.htm
From this distance my intention is to make a cluster analysis as below, using
the package 'mclust':
#
# --- Begin R script ---
#
# For units compatibility of 'iris' from R dataset and 'iris'
data used in
# the SAS example:
Measures = iris[,1:4]*10
Species = iris[,5]
irisSAS = data.frame(Measures, Species)
n = 3
Mah = c( 0,
89.86419, 0,
179.38471, 17.20107, 0)
# My Question is: how to obtain 'Mah' with R from 'irisSAS'
data?
D = matrix(0, n, n)
nam = c('Set', 'Ver', 'Vir')
rownames(D) = nam
colnames(D) = nam
k = 0
for (i in 1:n) {
for (j in 1:i) {
k = k+1
D[i,j] = Mah[k]
D[j,i] = Mah[k]
}
}
D=sqrt(D) #D2 -> D
library(mclust)
dendroS = hclust(as.dist(D), method='single')
dendroC = hclust(as.dist(D), method='complete')
win.graph(w = 3.5, h = 6)
split.screen(c(2, 1))
screen(1)
plot(dendroS, main='Single', sub='', xlab='',
ylab='', col='blue')
screen(2)
plot(dendroC, main='Complete', sub='', xlab='',
col='red')
#
# --- End R script ---
#
I always need of this type of analysis and I'm not founding how to make it
in
the CRAN documentation (Archives, packages: mclust, cluster, fpc and mva).
Regards,
--
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
joseclaudio.faria at terra.com.br
Jose Claudio Faria
2005-Jul-11 00:47 UTC
[R] Help: Mahalanobis distances between 'Species' from iris
Dear Mulholland,
I'm sorry, I was not clearly and objective sufficiently.
Below you can see what I'm was trying to do:
# D2Mah by JCFaria and Gabor Grothiendieck: 10/7/05 20:46:41
D2Mah = function(y, x) {
stopifnot(is.data.frame(y), !missing(x))
stopifnot(dim(y)[1] != dim(x)[1])
y = as.matrix(y)
x = as.factor(x)
man = manova(y ~ x)
E = summary(man)$SS[2] #Matrix E
S = as.matrix(E$Residuals)/man$df.residual
InvS = solve(S)
mds = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
f = function(a,b) mapply(function(a,b)
mahalanobis(mds[a,], mds[b,], InvS, TRUE), a, b)
seq. = seq(length = nrow(mds))
names(seq.) = levels(x)
D2 = outer(seq., seq., f)
}
#
# test
#
D2M = D2Mah(iris[,1:4], iris[,5])
print(D2M)
dend = hclust(as.dist(sqrt(D2M)), method='complete')
plot(dend)
So, Thanks for the reply.
Best,
JCFaria
Mulholland, Tom wrote:> Why don't you use mahalanobis in the stats package. The function
"Returns the Mahalanobis distance of all rows in 'x' and the vector
mu='center' with respect to Sigma='cov'. This is (for vector
'x') defined as
>
> D^2 = (x - mu)' Sigma^{-1} (x - mu)
>
> I don't have any idea what this does but it appears to be talking about
the same topic. If it is not suitable package fpc has related functions and
adehabitat has a function for "Habitat Suitability Mapping with Mahalanobis
Distances"
>
> Tom
>
>
>>-----Original Message-----
>>From: r-help-bounces at stat.math.ethz.ch
>>[mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Jose
>>Claudio Faria
>>Sent: Thursday, 7 July 2005 5:29 AM
>>To: r-help at stat.math.ethz.ch
>>Subject: [R] Help: Mahalanobis distances between 'Species' from
iris
>>
>>
>>Dear R list,
>>
>>I'm trying to calculate Mahalanobis distances for 'Species'
>>of 'iris' data
>>as obtained below:
>>
>>Squared Distance to Species From Species:
>>
>> Setosa Versicolor Virginica
>>Setosa 0 89.86419 179.38471
>>Versicolor 89.86419 0 17.20107
>>Virginica 179.38471 17.20107 0
>>
>>This distances above were obtained with proc 'CANDISC' of SAS,
please,
>>see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from
>>http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/
>>chap21/sect19.htm
>>
>> From this distance my intention is to make a cluster
>>analysis as below, using
>>the package 'mclust':
>>
>>#
>># --- Begin R script ---
>>#
>># For units compatibility of 'iris' from R dataset and
'iris'
>>data used in
>># the SAS example:
>>Measures = iris[,1:4]*10
>>Species = iris[,5]
>>irisSAS = data.frame(Measures, Species)
>>
>>n = 3
>>Mah = c( 0,
>> 89.86419, 0,
>> 179.38471, 17.20107, 0)
>>
>># My Question is: how to obtain 'Mah' with R from
'irisSAS' data?
>>
>>D = matrix(0, n, n)
>>
>>nam = c('Set', 'Ver', 'Vir')
>>rownames(D) = nam
>>colnames(D) = nam
>>
>>k = 0
>>for (i in 1:n) {
>> for (j in 1:i) {
>> k = k+1
>> D[i,j] = Mah[k]
>> D[j,i] = Mah[k]
>> }
>>}
>>
>>D=sqrt(D) #D2 -> D
>>
>>library(mclust)
>>dendroS = hclust(as.dist(D), method='single')
>>dendroC = hclust(as.dist(D), method='complete')
>>
>>win.graph(w = 3.5, h = 6)
>>split.screen(c(2, 1))
>>screen(1)
>>plot(dendroS, main='Single', sub='', xlab='',
ylab='', col='blue')
>>
>>screen(2)
>>plot(dendroC, main='Complete', sub='', xlab='',
col='red')
>>#
>># --- End R script ---
>>#
>>
>>I always need of this type of analysis and I'm not founding
>>how to make it in
>>the CRAN documentation (Archives, packages: mclust, cluster,
>>fpc and mva).
>>
>>Regards,
>>--
>>Jose Claudio Faria
>>Brasil/Bahia/UESC/DCET
>>Estatistica Experimental/Prof. Adjunto
>>mails:
>> joseclaudio.faria at terra.com.br
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
>>http://www.R-project.org/posting-guide.html
>>