Jonas Dehairs
2011-Jul-27 13:59 UTC
[R] Inversions in hierarchical clustering were they shouldn't be
Hi,
I''m using heatmap.2 to cluster my data, using the centroid method for
clustering and the maximum method for calculating the distance matrix:
library("gplots")
library("RColorBrewer")
test <- matrix(c(0.96, 0.07, 0.97, 0.98, 0.50, 0.28, 0.29, 0.77,
0.08, 0.96, 0.51, 0.51, 0.14, 0.19, 0.41, 0.51),
ncol=4, byrow=TRUE)
colnames(test) <-
c("Exp1","Exp2","Exp3","Exp4")
rownames(test) <- c("Gene1","Gene2","Gene3",
"Gene4")
test <- as.table(test)
mat = data.matrix(test)
heatmap.2(mat, dendrogram="row", Rowv=TRUE,
Colv=FALSE, distfun = function(x) dist(x,method =
''maximum''),
hclustfun = function(x) hclust(x,method = ''centroid''),
xlab = NULL, ylab = NULL, key=TRUE,
keysize=1, trace="none", density.info=c("none"),
margins=c(6, 12), col=bluered
)
This gives a heatmap with inversions in the cluster tree, which is inherent to
the centroid method. A solution to avoid inversions is to use the Euclidean or
the city-block distance, and indeed if you change maximum to euclidean in the
above example the inversions are gone.(for reference see chapter 4.1.1 in this
link<http://bonsai.hgc.jp/%7Emdehoon/software/cluster/manual/Hierarchical.html>)
Now as for my problem, when I use my actual data instead of this example table
the inversions are still there when I change to euclidean. The R code is exactly
the same as in this example, only the data is different. When I use cluster 3.0
and java treeview with the euclidean and centroid method there are no inversions
in my data as expected. So why does R give inversions? The theory and other
software says it shouldn''t.
Here is an example were changing maximum to euclidean does not fix inversions
(as opposed to the above example were it did fix it)
library("gplots")
library("RColorBrewer")
test <- matrix(c(0.96, 0.07, 0.97, 0.98, 0.99, 0.50, 0.28, 0.29, 0.77, 0.78,
0.08, 0.96, 0.51, 0.51, 0.55, 0.14, 0.19, 0.41, 0.51, 0.40, 0.97, 0.98, 0.99,
0.50, 0.28),ncol=6,byrow=TRUE)
colnames(test) <-
c("Exp1","Exp2","Exp3","Exp4","Exp5","Exp6")
rownames(test) <- c("Gene1","Gene2","Gene3",
"Gene4")
test <- as.table(test)
mat=data.matrix(test)
heatmap.2(mat, dendrogram="row", Rowv=TRUE,
Colv=FALSE, distfun = function(x) dist(x,method = ''maximum''),
hclustfun = function(x) hclust(x,method = ''centroid''),
xlab = NULL, ylab = NULL, key=TRUE,
keysize=1, trace="none", density.info=c("none"),
margins=c(6, 12), col=bluered
)
Do you have any idea what could be the cause of this discrepancy?
Kind regards
[[alternative HTML version deleted]]
