jgaspard
2009-Feb-20 15:06 UTC
[R] cluster analysis: mean values for each variable and cluster
Hi all! I'm new to R and don't know many about it. Because it is free, I managed to learn it a little bit. Here is my problem: I did a cluster analysis on 30 observations and 16 variables (monde, figaro, liberation, etc.). Here is the .txt data file: "monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone" 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0 1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0 0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1 1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0 1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0 0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0 0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0 1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1 0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0 1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0 0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0 0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0 The steps I made were those: headlines=read.table("/data.csv", header=T, sep=",") data dist=dist(data,method="euclidean") dist cluster=hclust(dist,method="ward") cluster plot(cluster) rect.hclust(cluster, k=4, border="red") I extracted 4 clusters from the data. My question is: is it possible to produce a summary of every mean values for each variable of each of the 4 clusters? Thanks a lot in advance, Jeoffrey -- View this message in context: http://www.nabble.com/cluster-analysis%3A-mean-values-for-each-variable-and-cluster-tp22120427p22120427.html Sent from the R help mailing list archive at Nabble.com.
Uwe Ligges
2009-Feb-20 16:19 UTC
[R] cluster analysis: mean values for each variable and cluster
jgaspard wrote:> Hi all! > > I'm new to R and don't know many about it. Because it is free, I managed to > learn it a little bit. > > Here is my problem: I did a cluster analysis on 30 observations and 16 > variables (monde, figaro, liberation, etc.). Here is the .txt data file: > > "monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone" > 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0 > 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0 > 1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0 > 0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1 > 1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1 > 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0 > 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0 > 1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0 > 0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0 > 0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0 > 1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1 > 0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0 > 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0 > 1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0 > 0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0 > 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0 > 0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > 1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0 > > > The steps I made were those: > > headlines=read.table("/data.csv", header=T, sep=",") > data > dist=dist(data,method="euclidean") > dist > cluster=hclust(dist,method="ward") > cluster > plot(cluster) > rect.hclust(cluster, k=4, border="red") > > I extracted 4 clusters from the data. My question is: is it possible to > produce a summary of every mean values for each variable of each of the 4 > clusters?Well, I think this is not what you want. Probably you want to use Manhattan distance (rather than Euclidean) 0/1 data and you want to know the number of 1s and the total number in each cluster. Anyway, in order to answer your question, do an assignment in the end such as: x <- rect.hclust(cluster, k=4, border="red") sapply(x, function(i) colMeans(data[i,])) Uwe Ligges> Thanks a lot in advance, > > Jeoffrey > > > >
Marcelino de la Cruz
2009-Feb-20 16:21 UTC
[R] cluster analysis: mean values for each variable and cluster
Try this: c4 <- cutree(cluster, k=4) by(data, c4, mean) HTH Marcelino On 2009-02-20 Jgaspard wrote: Hi all! I'm new to R and don't know many about it. Because it is free, I managed to learn it a little bit. Here is my problem: I did a cluster analysis on 30 observations and 16 variables (monde, figaro, liberation, etc.). Here is the .txt data file: "monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone" 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0 1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0 0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1 1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0 1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0 0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0 0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0 1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1 0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0 1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0 0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0 0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0 0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0 1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0 The steps I made were those: headlines=read.table("/data.csv", header=T, sep=",") data dist=dist(data,method="euclidean") dist cluster=hclust(dist,method="ward") cluster plot(cluster) rect.hclust(cluster, k=4, border="red") I extracted 4 clusters from the data. My question is: is it possible to produce a summary of every mean values for each variable of each of the 4 clusters? Thanks a lot in advance, Jeoffrey ________________________________ Marcelino de la Cruz Rot Departamento de Biolog?a Vegetal E.U.T.I. Agr?cola Universidad Polit?cnica de Madrid 28040-Madrid Tel.: 91 336 54 35 Fax: 91 336 56 56 marcelino.delacruz at upm.es