james.foadi at diamond.ac.uk
2009-Dec-10 13:26 UTC
[R] question about centroid-linkage (cluster analysis)
Dear R community, I would be greatful if somebody could shed light on the following. I have created a set of 6 points to check how centroid agglomeration works in cluster analysis:> Y <- data.frame(x=c(-1,1,1,-1,10,12),y=c(1,1,-1,-1,0,0))It is quite intuitive to understand that the last clusters to be joined will be {1,2,3,4} with {5,6}. Now, the centroid for the first cluster has coordinates (0,0), while the centroid for the second cluster has coordinates (11,0). Therefore, the distance between these two cluster should be 11. But:> Y.dist <- dist(Y) > Y.hc_c <- hclust(Y.dist,method="centroid") > Y.hc_c$merge[,1] [,2] [1,] -1 -2 [2,] -3 1 [3,] -4 2 [4,] -5 -6 [5,] 3 4> Y.hc_c$height[1] 2.000000 1.914214 1.517428 2.000000 9.692575 So, from this it would appear that the distance between the last two clusters is 9.692575! How can it be? J Dr James Foadi PhD Membrane Protein Laboratory (MPL) Diamond Light Source Ltd Diamond House Harewell Science and Innovation Campus Chilton, Didcot Oxfordshire OX11 0DE Email : james.foadi at diamond.ac.uk Alt Email: j.foadi at imperial.ac.uk -- This e-mail and any attachments may contain confidential...{{dropped:8}}
james.foadi at diamond.ac.uk
2009-Dec-11 14:57 UTC
[R] question about centroid-linkage (cluster analysis) (2)
Dear R community, just in case some haven't noticed my previous email. I realize "hclust" relies on a Fortran routine, but I hoped some of you might exactly know how that "Y.hc_c$height" is computed. And, thus, explain the anomaly I found. Thank you. J Dr James Foadi PhD Membrane Protein Laboratory (MPL) Diamond Light Source Ltd Diamond House Harewell Science and Innovation Campus Chilton, Didcot Oxfordshire OX11 0DE Email : james.foadi at diamond.ac.uk Alt Email: j.foadi at imperial.ac.uk -----Original Message----- From: r-help-bounces at r-project.org on behalf of james.foadi at diamond.ac.uk Sent: Thu 10/12/2009 13:26 To: r-help at r-project.org Subject: [R] question about centroid-linkage (cluster analysis) Dear R community, I would be greatful if somebody could shed light on the following. I have created a set of 6 points to check how centroid agglomeration works in cluster analysis:> Y <- data.frame(x=c(-1,1,1,-1,10,12),y=c(1,1,-1,-1,0,0))It is quite intuitive to understand that the last clusters to be joined will be {1,2,3,4} with {5,6}. Now, the centroid for the first cluster has coordinates (0,0), while the centroid for the second cluster has coordinates (11,0). Therefore, the distance between these two cluster should be 11. But:> Y.dist <- dist(Y) > Y.hc_c <- hclust(Y.dist,method="centroid") > Y.hc_c$merge[,1] [,2] [1,] -1 -2 [2,] -3 1 [3,] -4 2 [4,] -5 -6 [5,] 3 4> Y.hc_c$height[1] 2.000000 1.914214 1.517428 2.000000 9.692575 So, from this it would appear that the distance between the last two clusters is 9.692575! How can it be? J Dr James Foadi PhD Membrane Protein Laboratory (MPL) Diamond Light Source Ltd Diamond House Harewell Science and Innovation Campus Chilton, Didcot Oxfordshire OX11 0DE Email : james.foadi at diamond.ac.uk Alt Email: j.foadi at imperial.ac.uk -- This e-mail and any attachments may contain confidential...{{dropped:19}}