Leszek Nowina
2019-Apr-13 15:36 UTC
[R] Why is it not possible to cut a tree returned by Agnes or Diana by height?
> asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9))> cutree(agnes(asdf), h=100) Error in cutree(agnes(asdf), h = 100) : the 'height' component of 'tree' is not sorted (increasingly) > cutree(diana(asdf), h=100) Error in cutree(diana(asdf), h = 100) : the 'height' component of 'tree' is not sorted (increasingly) I'm not sure if I understand why this is the case. This is what I want: Cluster stuff by the //distances//, **not** by how many clusters I want to have. If two things are further from each other than X, they should go to different clusters. Otherwise, the same cluster. Is it unreasonable what I'm asking for? I image if I was to manually implement Agnes or Diana this would go like that: stop joining clusters if the smallest distance between any pair of clusters is larger than X (Agnes) or stop dividing clusters if the largest cluster has a diameter of X (Diana); but since both methods always join/divide to the very end I thought using cutree with a height parameter would give me what I need. It won't. Am I missing something?
Bert Gunter
2019-Apr-14 23:30 UTC
[R] Why is it not possible to cut a tree returned by Agnes or Diana by height?
Inline. Bert Gunter On Sun, Apr 14, 2019 at 4:12 PM Leszek Nowina <leekoinan at gmail.com> wrote:> > asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9)) > > cutree(agnes(asdf), h=100) > Error in cutree(agnes(asdf), h = 100) : > the 'height' component of 'tree' is not sorted (increasingly) > > cutree(diana(asdf), h=100) > Error in cutree(diana(asdf), h = 100) : > the 'height' component of 'tree' is not sorted (increasingly) > > I'm not sure if I understand why this is the case. > > This is what I want: Cluster stuff by the //distances//, **not** by > how many clusters I want to have. > > If two things are further from each other than X, they should go to > different clusters. Otherwise, the same cluster. > > Is it unreasonable what I'm asking for?Yes. X and Y are at a distance 2. Y and Z are at a distance 2. X and Z are at a distance 4. Your idea cannot be consistently applied if 3 is the cutoff for clustering: Xand Z would have to go in different clusters but both be in the same cluster as Y. Maybe you need to spend some time with the literature before trying to cook up your own notions. Cheers, Bert> I image if I was to manually > implement Agnes or Diana this would go like that: stop joining > clusters if the smallest distance between any pair of clusters is > larger than X (Agnes) or stop dividing clusters if the largest cluster > has a diameter of X (Diana); but since both methods always join/divide > to the very end I thought using cutree with a height parameter would > give me what I need. It won't. > > Am I missing something? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
William Dunlap
2019-Apr-14 23:47 UTC
[R] Why is it not possible to cut a tree returned by Agnes or Diana by height?
I think cutree() only works on things inheriting from class 'hclust' and agnes, et al do not produce such things. There are as.hclust methods for the output of agnes so you might try cutree( as.hclust( agnes(...)), h) instead of cutree( agnes(...), h) Bill Dunlap TIBCO Software wdunlap tibco.com On Sun, Apr 14, 2019 at 4:12 PM Leszek Nowina <leekoinan at gmail.com> wrote:> > asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9)) > > cutree(agnes(asdf), h=100) > Error in cutree(agnes(asdf), h = 100) : > the 'height' component of 'tree' is not sorted (increasingly) > > cutree(diana(asdf), h=100) > Error in cutree(diana(asdf), h = 100) : > the 'height' component of 'tree' is not sorted (increasingly) > > I'm not sure if I understand why this is the case. > > This is what I want: Cluster stuff by the //distances//, **not** by > how many clusters I want to have. > > If two things are further from each other than X, they should go to > different clusters. Otherwise, the same cluster. > > Is it unreasonable what I'm asking for? I image if I was to manually > implement Agnes or Diana this would go like that: stop joining > clusters if the smallest distance between any pair of clusters is > larger than X (Agnes) or stop dividing clusters if the largest cluster > has a diameter of X (Diana); but since both methods always join/divide > to the very end I thought using cutree with a height parameter would > give me what I need. It won't. > > Am I missing something? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Leszek Nowina
2019-Apr-15 13:10 UTC
[R] Why is it not possible to cut a tree returned by Agnes or Diana by height?
Either way, it would seem to me that cutree(tree, h=height) could be easily implemented as cutree(tree, k=sum(tree$height>height)+1) - why isn't it? Or is this not really the same, despite what seems to me? pon., 15 kwi 2019 o 01:30 Bert Gunter <bgunter.4567 at gmail.com> napisa?(a):> > Inline. > > Bert Gunter > > > On Sun, Apr 14, 2019 at 4:12 PM Leszek Nowina <leekoinan at gmail.com> wrote: >> >> > asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9)) >> > cutree(agnes(asdf), h=100) >> Error in cutree(agnes(asdf), h = 100) : >> the 'height' component of 'tree' is not sorted (increasingly) >> > cutree(diana(asdf), h=100) >> Error in cutree(diana(asdf), h = 100) : >> the 'height' component of 'tree' is not sorted (increasingly) >> >> I'm not sure if I understand why this is the case. >> >> This is what I want: Cluster stuff by the //distances//, **not** by >> how many clusters I want to have. >> >> If two things are further from each other than X, they should go to >> different clusters. Otherwise, the same cluster. >> >> Is it unreasonable what I'm asking for? > > Yes. > > X and Y are at a distance 2. Y and Z are at a distance 2. X and Z are at a distance 4. Your idea cannot be consistently applied if 3 is the cutoff for clustering: Xand Z would have to go in different clusters but both be in the same cluster as Y. > > Maybe you need to spend some time with the literature before trying to cook up your own notions. > > Cheers, > Bert > > >> >> I image if I was to manually >> implement Agnes or Diana this would go like that: stop joining >> clusters if the smallest distance between any pair of clusters is >> larger than X (Agnes) or stop dividing clusters if the largest cluster >> has a diameter of X (Diana); but since both methods always join/divide >> to the very end I thought using cutree with a height parameter would >> give me what I need. It won't. >> >> Am I missing something? >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.