Milan Bouchet-Valat
2012-Aug-12 13:37 UTC
[R] Different cluster orderings from cutree() and cut.dendrogram()
Hi! I just discovered that cutree() and cut.dendrogram() do not assign the same cluster numberings when called on the same tree. More specifically, cutree() assigns cluster numbers by order of appearance in the data, while cut.dendrogram() sorts clusters by height (see example below). I guess this is for historical reasons? I'm hit by this difference when I want to get a vector of cluster memberships after running a hierarchical clustering. One solution would be to avoid mixing methods for dendrogram and hclust objects. But I don't know an easy/clean way of getting the same information as cutree() provides using dendrogram methods. Help is more than welcome! I'd like to suggest a word about this discrepancy should be added to ?cut.dendrogram and/or ?cutree. An example about how to get cluster memberships using only dendrogram methods could also be useful. Example based on ?hclust:> hc <- hclust(dist(USArrests)) > table(cutree(hc, h=100))1 2 3 4 14 14 20 2> cut(as.dendrogram(hc), 100)$lower[[1]] 'dendrogram' with 2 branches and 2 members total, at height 38.52791 [[2]] 'dendrogram' with 2 branches and 14 members total, at height 64.99362 [[3]] 'dendrogram' with 2 branches and 14 members total, at height 68.76227 [[4]] 'dendrogram' with 2 branches and 20 members total, at height 87.32634 (See how number of members by cluster differ in their ordering.) Regards