Christopher R. Dolanc
2012-Oct-11 22:50 UTC
[R] extracting groups from hclust() for a very large matrix
Hello, I'm having trouble figuring out how to see resulting groups (clusters) from my hclust() output. I have a very large matrix of 4371 plots and 29 species, so simply looking at the graph is impossible. There must be a way to 'print' the results to a table that shows which plots were in what group, correct? I've attached the matrix I'm working with (the whole thing since the point is its large size). I've been able to run the following code to get the groups I need: > VTM.Dist<- vegdist(VTM.Matrix) > VTM.HClust<- hclust(VTM.Dist, method="ward") > plot(VTM.HClust, hang=-1) It takes a while, but it does run. Then, I can extract 8 groups, which I'd like to experiment with, but is about how many I'd like: rect.hclust(VTM.HClust, 8) > VTM.8groups<- cutree(VTM.HClust, 8) But, instead of listing the plots by name, it only tells me *how many* plots are in the eight groups: > table(VTM.8groups) VTM.8groups 1 2 3 4 5 6 7 8 137 173 239 356 709 585 908 1264 The vegemite() function also doesn't work for this reason - I have way too many plots so they number in the thousands, which vegemite doesn't like. > vegemite(VTM.Matrix, VTM.HClust) Error in vegemite(VTM.Matrix, VTM.HClust) : Cowardly refusing to use longer than 1 char symbols: Use scale Does anybody know how I can get a simple list of plots in each category? I would think this would be something like a summary command. Perhaps a different clustering method? Thanks, Chris Dolanc -- Christopher R. Dolanc Post-doctoral Researcher University of Montana and UC-Davis
Milan Bouchet-Valat
2012-Oct-12 08:06 UTC
[R] extracting groups from hclust() for a very large matrix
Le jeudi 11 octobre 2012 ? 15:50 -0700, Christopher R. Dolanc a ?crit :> Hello, > > I'm having trouble figuring out how to see resulting groups (clusters) > from my hclust() output. I have a very large matrix of 4371 plots and 29 > species, so simply looking at the graph is impossible. There must be a > way to 'print' the results to a table that shows which plots were in > what group, correct? > > I've attached the matrix I'm working with (the whole thing since the > point is its large size).I can't see it (probably removed by the server). Anyways, you should be able to reproduce the same thing with a small reproducible example: I don't see anything related to a large matrix below, apart maybe the vegemite() error.> I've been able to run the following code to > get the groups I need: > > > VTM.Dist<- vegdist(VTM.Matrix) > > VTM.HClust<- hclust(VTM.Dist, method="ward") > > plot(VTM.HClust, hang=-1) > > It takes a while, but it does run. Then, I can extract 8 groups, which > I'd like to experiment with, but is about how many I'd like: > > rect.hclust(VTM.HClust, 8) > > VTM.8groups<- cutree(VTM.HClust, 8) > > But, instead of listing the plots by name, it only tells me *how many* > plots are in the eight groups: > > > table(VTM.8groups) > VTM.8groups > 1 2 3 4 5 6 7 8 > 137 173 239 356 709 585 908 1264Just remove the call to table(). This function is precisely made to tell you how many times each value (here group) is present. If you want the list of plots and their groups, it's here: VTM.8groups> The vegemite() function also doesn't work for this reason - I have way > too many plots so they number in the thousands, which vegemite doesn't like. > > > vegemite(VTM.Matrix, VTM.HClust) > Error in vegemite(VTM.Matrix, VTM.HClust) : > Cowardly refusing to use longer than 1 char symbols: > Use scale > > Does anybody know how I can get a simple list of plots in each category? > I would think this would be something like a summary command. Perhaps a > different clustering method? > > Thanks, > Chris Dolanc > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Milan Bouchet-Valat
2012-Oct-12 19:05 UTC
[R] extracting groups from hclust() for a very large matrix
Le vendredi 12 octobre 2012 ? 11:33 -0700, Christopher R. Dolanc a ?crit :> That command gives me the same result. Do you see that R is not listing > the plot numbers? Just all the numbers between 1 and 137, 138 and 310, > etc. It's like it has reordered the dendrogram, so that everything > occurs chronologically. > > Instead, I would expect something like this: > > [1] > 3, 15, 48, 134, 136, 213, 299, ..... > > [2] > 44, 67, 177, .....Yeah, but that's a problem with your data or your dist function, not with hclust() and cutree(). As always, it's good to try to find the minimal example that reproduces the problem. Start from examples provided by ?cutree: hc <- hclust(dist(USArrests)) cutree(hc, k=2) Alabama Alaska Arizona Arkansas California 1 1 1 2 1 Colorado Connecticut Delaware Florida Georgia 2 2 1 1 2 etc. Here you see the cluster numbers are not in sequence, and my command shows groups correctly: split(rownames(USArrests), cutree(hc, 2)) $`1` [1] "Alabama" "Alaska" "Arizona" "California" etc. $`2` [1] "Arkansas" "Colorado" "Connecticut" "Georgia" [5] "Hawaii" "Idaho" "Indiana" "Iowa" etc. So either your data is already ordered, or you have a problem with your distance function. One guess: you have included the "Plot" column in the call to vegdist(). I don't know this function, but it seems to work like dist(), which means passing the plot id is plain wrong. Indeed, if I use VTM.Dist<-vegdist(VTM.Matrix[,-1]) VTM.HClust<- hclust(VTM.Dist, method="ward") VTM.8groups<- cutree(VTM.HClust, 8) the result is not ordered as before. Lesson: try with simple, standard data when complex data sets don't work, and compare results. My two cents