Naxerova, Kamila
2013-Aug-06 17:59 UTC
[R] How to retrieve pairwise distances between clusters after cutting the tree?
Dear all, what would be the best way of retrieving distances between individual clusters after cutting my tree of interest? $height from the hclust object will give me the distance between clusters at a each agglomeration step, but let's say I have a situation where I have six observations A, B, C, D, E, F. The clustering proceeds 1) {A,B} 2) {C,D}, 3) {E,F}, 3) {C,D,E,F} 4) {A,B,C,D,E,F} but now I want to know the distance between {A,B} and {E,F} which is not directly recorded in $height? I could find the distance by locating cluster members in the original distance matrix, but is there a more direct way that I might not be aware of? Something along the lines of calc.pairwise.dist(cutree(hclust(dist),k=3))? Many thanks in advance. Kamila
David Carlson
2013-Aug-06 20:54 UTC
[R] How to retrieve pairwise distances between clusters after cutting the tree?
Assuming you are defining "distance between clusters" as the distance between the centroids and you have the original data, you can use aggregate() on the original data with the output from cutree() as the grouping variable to create a new data.frame of cluster centers (means). Then just run that through dist(). Something like set.seed(42) x <- matrix(runif(250), 25, 10) dist(aggregate(x, by=list(cutree(hclust(dist(x)), k=3)), mean)) # 1 2 # 2 1.297682 # 3 2.150580 1.380707 ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Naxerova, Kamila Sent: Tuesday, August 6, 2013 1:00 PM To: r-help at r-project.org Subject: [R] How to retrieve pairwise distances between clusters after cutting the tree? Dear all, what would be the best way of retrieving distances between individual clusters after cutting my tree of interest? $height from the hclust object will give me the distance between clusters at a each agglomeration step, but let's say I have a situation where I have six observations A, B, C, D, E, F. The clustering proceeds 1) {A,B} 2) {C,D}, 3) {E,F}, 3) {C,D,E,F} 4) {A,B,C,D,E,F} but now I want to know the distance between {A,B} and {E,F} which is not directly recorded in $height? I could find the distance by locating cluster members in the original distance matrix, but is there a more direct way that I might not be aware of? Something along the lines of calc.pairwise.dist(cutree(hclust(dist),k=3))? Many thanks in advance. Kamila ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.