thr3ads.net - R help - [R] cluster analysis: mean values for each variable and cluster [Feb 2009]

If this information is useful, please help other people find it:
Share via:

jgaspard

2009-Feb-20 15:06 UTC

[R] cluster analysis: mean values for each variable and cluster

Hi all!

I'm new to R and don't know many about it. Because it is free, I managed
to
learn it a little bit.

Here is my problem: I did a cluster analysis on 30 observations and 16
variables (monde, figaro, liberation, etc.). Here is the .txt data file:

"monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone"
1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0
0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1
1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1
1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0
1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1
0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0
0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0
0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0


The steps I made were those:

headlines=read.table("/data.csv", header=T, sep=",")
data
dist=dist(data,method="euclidean")
dist
cluster=hclust(dist,method="ward")
cluster
plot(cluster)
rect.hclust(cluster, k=4, border="red")

I extracted 4 clusters from the data. My question is: is it possible to
produce a summary of every mean values for each variable of each of the 4
clusters?

Thanks a lot in advance,

Jeoffrey




-- 
View this message in context:
http://www.nabble.com/cluster-analysis%3A-mean-values-for-each-variable-and-cluster-tp22120427p22120427.html
Sent from the R help mailing list archive at Nabble.com.

Uwe Ligges

2009-Feb-20 16:19 UTC

head link

[R] cluster analysis: mean values for each variable and cluster

jgaspard wrote:> Hi all!
> 
> I'm new to R and don't know many about it. Because it is free, I
managed to
> learn it a little bit.
> 
> Here is my problem: I did a cluster analysis on 30 observations and 16
> variables (monde, figaro, liberation, etc.). Here is the .txt data file:
> 
>
"monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone"
> 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0
> 0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1
> 1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1
> 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0
> 0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0
> 0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0
> 1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1
> 0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0
> 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
> 1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0
> 0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0
> 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
> 0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 
> 
> The steps I made were those:
> 
> headlines=read.table("/data.csv", header=T, sep=",")
> data
> dist=dist(data,method="euclidean")
> dist
> cluster=hclust(dist,method="ward")
> cluster
> plot(cluster)
> rect.hclust(cluster, k=4, border="red")
> 
> I extracted 4 clusters from the data. My question is: is it possible to
> produce a summary of every mean values for each variable of each of the 4
> clusters?

Well, I think this is not what you want.
Probably you want to use Manhattan distance (rather than Euclidean) 0/1 
data and you want to know the number of 1s and the total number in each 
cluster.

Anyway, in order to answer your question, do an assignment in the end 
such as:

x <- rect.hclust(cluster, k=4, border="red")
sapply(x, function(i) colMeans(data[i,]))

Uwe Ligges


> Thanks a lot in advance,
> 
> Jeoffrey
> 
> 
> 
>

Marcelino de la Cruz

2009-Feb-20 16:21 UTC

head link

[R] cluster analysis: mean values for each variable and cluster

Try this:

  c4 <- cutree(cluster, k=4)
  by(data, c4, mean)

HTH

Marcelino


On 2009-02-20 Jgaspard wrote:



Hi all!

I'm new to R and don't know many about it. Because it is free, I managed
to
learn it a little bit.

Here is my problem: I did a cluster analysis on 30 observations and 16
variables (monde, figaro, liberation, etc.). Here is the .txt data file:

"monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone"
1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0
0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1
1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1
1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0
1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1
0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0
0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0
0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0
0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0

The steps I made were those:

headlines=read.table("/data.csv", header=T, sep=",")
data
dist=dist(data,method="euclidean")
dist
cluster=hclust(dist,method="ward")
cluster
plot(cluster)
rect.hclust(cluster, k=4, border="red")

I extracted 4 clusters from the data. My question is: is it possible to
produce a summary of every mean values for each variable of each of the 4
clusters?

Thanks a lot in advance,

Jeoffrey






________________________________

Marcelino de la Cruz Rot

Departamento de  Biolog?a Vegetal
E.U.T.I. Agr?cola
Universidad Polit?cnica de Madrid
28040-Madrid
Tel.: 91 336 54 35
Fax: 91 336 56 56
marcelino.delacruz at upm.es

R help - Feb 2009 - cluster analysis: mean values for each variable and cluster

[R] cluster analysis: mean values for each variable and cluster

[R] cluster analysis: mean values for each variable and cluster

[R] cluster analysis: mean values for each variable and cluster