Jonas Dehairs
2011-Jul-27 13:59 UTC
[R] Inversions in hierarchical clustering were they shouldn't be
Hi, I''m using heatmap.2 to cluster my data, using the centroid method for clustering and the maximum method for calculating the distance matrix: library("gplots") library("RColorBrewer") test <- matrix(c(0.96, 0.07, 0.97, 0.98, 0.50, 0.28, 0.29, 0.77, 0.08, 0.96, 0.51, 0.51, 0.14, 0.19, 0.41, 0.51), ncol=4, byrow=TRUE) colnames(test) <- c("Exp1","Exp2","Exp3","Exp4") rownames(test) <- c("Gene1","Gene2","Gene3", "Gene4") test <- as.table(test) mat = data.matrix(test) heatmap.2(mat, dendrogram="row", Rowv=TRUE, Colv=FALSE, distfun = function(x) dist(x,method = ''maximum''), hclustfun = function(x) hclust(x,method = ''centroid''), xlab = NULL, ylab = NULL, key=TRUE, keysize=1, trace="none", density.info=c("none"), margins=c(6, 12), col=bluered ) This gives a heatmap with inversions in the cluster tree, which is inherent to the centroid method. A solution to avoid inversions is to use the Euclidean or the city-block distance, and indeed if you change maximum to euclidean in the above example the inversions are gone.(for reference see chapter 4.1.1 in this link<http://bonsai.hgc.jp/%7Emdehoon/software/cluster/manual/Hierarchical.html>) Now as for my problem, when I use my actual data instead of this example table the inversions are still there when I change to euclidean. The R code is exactly the same as in this example, only the data is different. When I use cluster 3.0 and java treeview with the euclidean and centroid method there are no inversions in my data as expected. So why does R give inversions? The theory and other software says it shouldn''t. Here is an example were changing maximum to euclidean does not fix inversions (as opposed to the above example were it did fix it) library("gplots") library("RColorBrewer") test <- matrix(c(0.96, 0.07, 0.97, 0.98, 0.99, 0.50, 0.28, 0.29, 0.77, 0.78, 0.08, 0.96, 0.51, 0.51, 0.55, 0.14, 0.19, 0.41, 0.51, 0.40, 0.97, 0.98, 0.99, 0.50, 0.28),ncol=6,byrow=TRUE) colnames(test) <- c("Exp1","Exp2","Exp3","Exp4","Exp5","Exp6") rownames(test) <- c("Gene1","Gene2","Gene3", "Gene4") test <- as.table(test) mat=data.matrix(test) heatmap.2(mat, dendrogram="row", Rowv=TRUE, Colv=FALSE, distfun = function(x) dist(x,method = ''maximum''), hclustfun = function(x) hclust(x,method = ''centroid''), xlab = NULL, ylab = NULL, key=TRUE, keysize=1, trace="none", density.info=c("none"), margins=c(6, 12), col=bluered ) Do you have any idea what could be the cause of this discrepancy? Kind regards [[alternative HTML version deleted]]