jgaspard
2009-Feb-20 15:06 UTC
[R] cluster analysis: mean values for each variable and cluster
Hi all!
I'm new to R and don't know many about it. Because it is free, I managed
to
learn it a little bit.
Here is my problem: I did a cluster analysis on 30 observations and 16
variables (monde, figaro, liberation, etc.). Here is the .txt data file:
"monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone"
1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0
0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1
1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1
1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0
1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1
0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0
0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0
0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0
The steps I made were those:
headlines=read.table("/data.csv", header=T, sep=",")
data
dist=dist(data,method="euclidean")
dist
cluster=hclust(dist,method="ward")
cluster
plot(cluster)
rect.hclust(cluster, k=4, border="red")
I extracted 4 clusters from the data. My question is: is it possible to
produce a summary of every mean values for each variable of each of the 4
clusters?
Thanks a lot in advance,
Jeoffrey
--
View this message in context:
http://www.nabble.com/cluster-analysis%3A-mean-values-for-each-variable-and-cluster-tp22120427p22120427.html
Sent from the R help mailing list archive at Nabble.com.
Uwe Ligges
2009-Feb-20 16:19 UTC
[R] cluster analysis: mean values for each variable and cluster
jgaspard wrote:> Hi all! > > I'm new to R and don't know many about it. Because it is free, I managed to > learn it a little bit. > > Here is my problem: I did a cluster analysis on 30 observations and 16 > variables (monde, figaro, liberation, etc.). Here is the .txt data file: > > "monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone" > 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0 > 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0 > 1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0 > 0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1 > 1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1 > 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0 > 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0 > 1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0 > 0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0 > 0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0 > 1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1 > 0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0 > 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0 > 1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0 > 0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0 > 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0 > 0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > > > The steps I made were those: > > headlines=read.table("/data.csv", header=T, sep=",") > data > dist=dist(data,method="euclidean") > dist > cluster=hclust(dist,method="ward") > cluster > plot(cluster) > rect.hclust(cluster, k=4, border="red") > > I extracted 4 clusters from the data. My question is: is it possible to > produce a summary of every mean values for each variable of each of the 4 > clusters?Well, I think this is not what you want. Probably you want to use Manhattan distance (rather than Euclidean) 0/1 data and you want to know the number of 1s and the total number in each cluster. Anyway, in order to answer your question, do an assignment in the end such as: x <- rect.hclust(cluster, k=4, border="red") sapply(x, function(i) colMeans(data[i,])) Uwe Ligges> Thanks a lot in advance, > > Jeoffrey > > > >
Marcelino de la Cruz
2009-Feb-20 16:21 UTC
[R] cluster analysis: mean values for each variable and cluster
Try this:
c4 <- cutree(cluster, k=4)
by(data, c4, mean)
HTH
Marcelino
On 2009-02-20 Jgaspard wrote:
Hi all!
I'm new to R and don't know many about it. Because it is free, I managed
to
learn it a little bit.
Here is my problem: I did a cluster analysis on 30 observations and 16
variables (monde, figaro, liberation, etc.). Here is the .txt data file:
"monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone"
1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0
0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1
1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1
1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0
1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1
0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0
0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0
0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0
The steps I made were those:
headlines=read.table("/data.csv", header=T, sep=",")
data
dist=dist(data,method="euclidean")
dist
cluster=hclust(dist,method="ward")
cluster
plot(cluster)
rect.hclust(cluster, k=4, border="red")
I extracted 4 clusters from the data. My question is: is it possible to
produce a summary of every mean values for each variable of each of the 4
clusters?
Thanks a lot in advance,
Jeoffrey
________________________________
Marcelino de la Cruz Rot
Departamento de Biolog?a Vegetal
E.U.T.I. Agr?cola
Universidad Polit?cnica de Madrid
28040-Madrid
Tel.: 91 336 54 35
Fax: 91 336 56 56
marcelino.delacruz at upm.es