Mao Jianfeng
2011-Jan-13 15:36 UTC
[R] how to calculate the consistency of different clusterings
Dear R-listers, I do clustering on tens of individuals by thousands of traits. I have known the assignment of each individual. I want to classify the individuals by randomly resampling different subsets of the traits, for example, randomly resampling 100 traits for 100 times, then 200 traits for 100 times, then 300 traits for 100 times, ,,,,,,. By each subset of traits, I do clustering of the same individuals. In the end, I want to get the consistency (in percentage) of each of these clusterings (as examples, here "cluster.1", "cluster.2" and "cluster.3" in the dummy data) with the assignment which is already known ("populations" in the dummy data). I want to know how such work can be implemented, maybe by using R. #dummy data, clus.data <- data.frame(individual = paste("ind", 1:12, sep = ""), populations = c(rep("popA", 5), rep("popB", 7)), cluster.1 = c(rep(1, 5), rep(2, 7)), cluster.2 = c(rep(2, 4), rep(1, 8)), cluster.3 c(rep(4, 7), rep(5, 5))) clus.data Thanks. -- Jian-Feng, Mao the Institute of Botany, Chinese Academy of Botany,
Michael Bedward
2011-Jan-17 00:57 UTC
[R] how to calculate the consistency of different clusterings
Hello, I've been waiting to see if anyone else would answer this. I've previously used random reallocation of objects to groups (clusters) as a monte-carlo test of the informativeness of groups, as described here: http://lastresortsoftware.blogspot.com/2010/09/monte-carlo-testing-of-classification.html However, in your case it sounds like you want to investigate the influence of particular attributes (traits) or groups of attributes on the classification - is that correct ? If so, I can probably help with some R code but I'd need to know the clustering method you are using (e.g. hclust). Michael On 14 January 2011 02:36, Mao Jianfeng <jianfeng.mao at gmail.com> wrote:> Dear R-listers, > > I do clustering on tens of individuals by thousands of traits. I have > known the assignment of each individual. I want to classify the > individuals by randomly resampling different subsets of the traits, > for example, randomly resampling 100 traits for 100 times, then 200 > traits for 100 times, then 300 traits for 100 times, ,,,,,,. By each > subset of traits, I do clustering of the same individuals. > > In the end, I want to get the consistency (in percentage) of each of > these clusterings (as examples, here "cluster.1", "cluster.2" and > "cluster.3" in the dummy data) with the assignment which is already > known ("populations" in the dummy data). I want to know how such work > can be implemented, maybe by using R. > > #dummy data, > > clus.data <- data.frame(individual = paste("ind", 1:12, sep = ""), > populations = c(rep("popA", 5), rep("popB", 7)), cluster.1 = c(rep(1, > 5), rep(2, 7)), cluster.2 = c(rep(2, 4), rep(1, 8)), cluster.3 > c(rep(4, 7), rep(5, 5))) > > clus.data > > Thanks. > > > -- > Jian-Feng, Mao > > the Institute of Botany, > Chinese Academy of Botany, > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Reasonably Related Threads
- how to substitute missing values (NAs) by the group means
- Why are there small circles in my plot
- how to fill the area under the density line with semitransparent colors
- how to use "lapplyBy" function of "doBy" package
- How to know the row number of raw matrix after resampling?