Hi, this is rather a (presumed) bug report than a question because I can solve my personal statistical problem by working with hclust instead of agnes. I have done a complete linkage clustering on a dist object dm with 30 objects with agnes (R 1.8.0 on RedHat) and I want to obtain the partition that results from a cut at height=0.4. I run> cl1a <- agnes(dm, method="complete") > cutree(cl1a,h=0.4)[1] 1 2 3 4 5 6 3 7 3 8 9 10 3 11 12 13 14 15 3 16 17 3 18 19 20 [26] 21 3 22 18 23 But that's not true; correct is the solution obtained from hclust> clx <- hclust(dm) > cutree(cl1,h=0.4)[1] 1 2 1 2 3 4 1 2 1 3 4 5 1 4 6 7 8 4 1 5 2 1 9 2 2 [26] 10 1 9 9 11 as can be seen from the dendrogram plots of hclust *and* agnes. (Note that the dendrograms of hclust and agnes are not identical due to the handling of ties in the distances, but the difference between the agnes and hclust dendrogram at h=0.4 concerns only two points.) Specifying k instead of h in cutree for agnes seems to work properly, but that's not what I need in the general case. I tried to reproduce this with a toy example, but it worked (too) well:> d[,1] [,2] [,3] [1,] 0 1 2 [2,] 1 0 3 [3,] 2 3 0> ad <- agnes(as.dist(d),method="complete") > cutree(ad,h=1.5)[1] 1 1 2> ah <- hclust(as.dist(d)) > cutree(ah,h=1.5)[1] 1 1 2 I can send anyone who would like to reproduce the problem (Martin?) the original distance matrix dm (dm is a dist object) as ASCII or R-object. Best, Christian *********************************************************************** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/ ####################################################################### ich empfehle www.boag-online.de
[diverted from R-help to R-devel; please follow up on R-devel!]>>>>> "ChrisH" == Christian Hennig <fm3a004@math.uni-hamburg.de> >>>>> on Thu, 11 Dec 2003 15:13:52 +0100 (MET) writes:ChrisH> Hi, this is rather a (presumed) bug report than a ChrisH> question because I can solve my personal statistical ChrisH> problem by working with hclust instead of agnes. ChrisH> I have done a complete linkage clustering on a dist ChrisH> object dm with 30 objects with agnes (R 1.8.0 on ChrisH> RedHat) and I want to obtain the partition that ChrisH> results from a cut at height=0.4. ChrisH> I run >> cl1a <- agnes(dm, method="complete"); cutree(cl1a,h=0.4) ChrisH> [1] 1 2 3 4 5 6 3 7 3 8 9 10 3 11 12 13 14 15 3 16 17 3 18 19 20 ChrisH> [26] 21 3 22 18 23 ChrisH> But that's not true; correct is the solution ChrisH> obtained from hclust >> clx <- hclust(dm); cutree(clx,h=0.4) ChrisH> [1] 1 2 1 2 3 4 1 2 1 3 4 5 1 4 6 7 8 4 1 5 2 1 9 2 2 ChrisH> [26] 10 1 9 9 11 ChrisH> as can be seen from the dendrogram plots of hclust ChrisH> *and* agnes. (Note that the dendrograms of hclust ChrisH> and agnes are not identical due to the handling of ChrisH> ties in the distances, but the difference between ChrisH> the agnes and hclust dendrogram at h=0.4 concerns ChrisH> only two points.) Specifying k instead of h in ChrisH> cutree for agnes seems to work properly, but that's ChrisH> not what I need in the general case. If I lookup the help page for cutree, agnes and agnes.object, nothing says that you can expect cutree to work with agnes objects directly. On the contrary, ?cutree says about its first argument tree: a tree as produced by 'hclust'. 'cutree()' only expects a list with components 'merge', 'height', and 'labels', of appropriate content each. and ?agnes.object mentions the as.hclust() function that's needed to produce an "hclust"-like object from the result of agnes() {or diana()}. Summarizing, 1) You need cutree(as.hclust(cl1a), h=0.4) 2) cutree() shouldn't silently return a wrong result for agnes (or diana) objects. Rather, it should return the proper thing or give an error. Here I elaborate a bit on "2)" which is not entirely trivial -- hence the diversion to R-devel. The best approach would be to make cutree() a generic function with the `obvious' "hclust" & "twins" methods and a "default" method which just uses something like NextMethod( as.hclust() ..). However this breaks back-compatibility: cutree() may not work anymore on user-constructed objects that are just list()s as described for `tree' above. We could alleviate this problem by try to make as.hclust.default() much smarter, but I would tend to try not to do it and let other people write their own as.hclust.* methods for their own constructed objects. Does this seem viable? If I don't hear protest, I'll eventually try to do this (in R-devel). Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <><