Bob Green
2012-Nov-18 11:00 UTC
[R] Examining how cases are similar by cluster, in cluster analysis
Hello, I used the following code to perform a cluster analysis on a dataframe consisting of 12 variables (coded as 1,0) and 63 cases. FS1 <- read.csv("D://Arsontest2.csv",header=T,row.names=1) str(FS1) dmat <- dist(FS1, method="binary") cl.test <- hclust (dist(FS1, method ="binary"), "ave") plot(cl.test, hang = -1) Each case has an id and the dendogram identifies the respective cases which constitute each cluster. What I am seeking advice on is how to examine the variables on which the cases are similar, within each cluster. sort (hcli8 <- cutree(cl.test, k=8)) identifies that the following cluster 2is comprised of the following cases: 1641 2295 2594 2654 2799 3213 3510 3513 2958 3294 2 2 2 2 2 2 2 2 2 2 This code provides means for the variables by cluster. In relation to cluster 2 it appears the cases should have no clear motive and be depressed : round(sapply(x, function(i) colMeans(FS1[i,])),2) [,1] [,2] [,3] [ ,4] [,5] [,6] [,7] [,8] depressed 0.00 0.33 0.00 0.0 0 0.6 0.00 0.08 unclear 0.33 1.00 1.00 1.0 0 0.0 0.07 0.12 I can manually, examine this variable by variable and look at how each of the cases in cluster 2 are similar on the variables. I am looking at a more efficient and quicker way to do this. Bob
David L Carlson
2012-Nov-18 20:44 UTC
[R] Examining how cases are similar by cluster, in cluster analysis
If you just want a summary of the mean for each variable in each cluster, this will get you there:> set.seed=42 > FS1 <- data.frame(matrix(sample(c(0, 1), 12*63, replace=TRUE),nrow=63, + ncol=12))> dmat <- dist(FS1, method="binary") > cl.test <- hclust(dmat, method="average") > plot(cl.test, hang=-1) > hcli8 <- cutree(cl.test, k=8) > tbl <- aggregate(FS1, by=list(Group=hcli8), mean) > print(tbl, digits=4)Group X1 X2 X3 X4 X5 X6 X7 X8 X9 1 1 0.5122 0.6829 0.6829 0.6341 0.5854 0.5854 0.6829 0.6341 0.5366 2 2 0.0000 0.0000 0.0000 1.0000 0.6667 0.6667 0.0000 0.6667 0.0000 3 3 0.9286 0.1429 0.1429 0.1429 0.2857 0.5714 0.7857 0.3571 0.8571 4 4 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 5 5 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 6 6 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000 0.0000 7 7 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 8 8 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 X10 X11 X12 1 0.4146 0.4634 0.561 2 0.6667 0.0000 0.000 3 0.8571 0.6429 0.500 4 1.0000 0.0000 0.000 5 0.0000 1.0000 0.000 6 0.0000 0.0000 1.000 7 0.0000 0.0000 0.000 8 0.0000 0.0000 0.000>---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Bob Green > Sent: Sunday, November 18, 2012 5:00 AM > To: r-help at r-project.org > Subject: [R] Examining how cases are similar by cluster, in > cluster analysis > > Hello, > > I used the following code to perform a cluster analysis on a > dataframe consisting of 12 variables (coded as 1,0) and 63 > cases. > > > > FS1 <- read.csv("D://Arsontest2.csv",header=T,row.names=1) > > str(FS1) > > dmat <- dist(FS1, method="binary") > > cl.test <- hclust (dist(FS1, method ="binary"), "ave") > > plot(cl.test, hang = -1) > > > > Each case has an id and the dendogram identifies the respective > cases > which constitute each cluster. What I am seeking advice on is > how to > examine the variables on which the cases are similar, within > each cluster. > > > > sort (hcli8 <- cutree(cl.test, k=8)) identifies that the > following > cluster 2is comprised of the following cases: > > 1641 2295 2594 2654 2799 3213 3510 3513 2958 3294 > > 2 2 2 2 2 2 2 > 2 > 2 2 > > > > This code provides means for the variables by cluster. In > relation to > cluster 2 it appears the cases should have no clear motive and > be depressed : > > round(sapply(x, function(i) colMeans(FS1[i,])),2) > > [,1] [,2] [,3] [ ,4] [,5] > [,6] [,7] [,8] > > depressed 0.00 0.33 0.00 0.0 0 0.6 0.00 0.08 > > unclear 0.33 1.00 1.00 1.0 0 0.0 0.07 0.12 > > > > I can manually, examine this variable by variable and look at > how > each of the cases in cluster 2 are similar on the variables. I > am > looking at a more efficient and quicker way to do this. > > Bob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R- > project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible > code.