Hi R helpers, I was wondering if anybody knows if is possible to generate bootstrap values for a cluster analysis in R. What I am trying to do is obtain some confidence on the clusters formed by resampling the data set. A similar type of analysis is used in molecular taxonomy and the confidence values of each cluster are placed in the nodes of the dendogram. Any ideas on how to do this in R will be appreciated. Thanks H?ctor L. Ayala-del-R?o, Ph.D. Center for Microbial Ecology & Center for Genomic and Evolutionary Studies on Microbial Life at Low Temperatures Michigan State University 545 Plant & Soil Sciences Building East Lansing, MI 48824-1325 Phone: 517-353-9021 Fax: 517-353-2917
You could try function fanny() in package cluster. This function does fuzzy clustering, and returns a coefficient for each object/cluster of how fuzzy or crisp the clustering was. It's hard to imagine bootstrapping a confidence interval around a categorical value such as a cluster. Perhaps someone else can explain this. Regards, Andrew C. Ward CAPE Centre Department of Chemical Engineering The University of Queensland Brisbane Qld 4072 Australia andreww at cheque.uq.edu.au Quoting "Hector L. Ayala-del-Rio" <ayalahec at msu.edu>:> Hi R helpers, > > I was wondering if anybody knows if is possible to generate > bootstrap > values for a cluster analysis in R. What I am trying to do is > obtain > some confidence on the clusters formed by resampling the data set. > A > similar type of analysis is used in molecular taxonomy and the > confidence values of each cluster are placed in the nodes of the > dendogram. Any ideas on how to do this in R will be appreciated. > > Thanks > > H?ctor L. Ayala-del-R?o, Ph.D. > Center for Microbial Ecology & > Center for Genomic and Evolutionary Studies > on Microbial Life at Low Temperatures > Michigan State University > 545 Plant & Soil Sciences Building > East Lansing, MI 48824-1325 > Phone: 517-353-9021 > Fax: 517-353-2917 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
> It's hard to imagine bootstrapping a confidence interval around a > categorical value such as a cluster. Perhaps someone else can explain > this.Bootstrap support indices are not confidence intervals. The process is, I think, (in a nutshell): Assume that there are N objects to be clustered, based on the similarity of C variables measured on each of the N objects. 1. Create a bootstrap dataset by resampling the C variables with replacement on the N objects. 2. Run the clustering algorithm on the bootstrap dataset to cluster the N objects. 3. Repeat steps 1 and 2 a large number of times. 4. Construct a majority-rule consensus tree from all the bootstrapped cluster analyses. 4. Calculate the bootstrap support index for each cluster in the consensus tree as the percentage of times each cluster was recovered in the set of bootstrapped cluster analyses. See Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783-791. No, I don't know how to do this in R, but I agree that it would be useful! Simon. Simon Blomberg Depression & Anxiety Consumer Research Unit Centre for Mental Health Research Australian National University http://www.anu.edu.au/cmhr/ Simon.Blomberg at anu.edu.au +61 (2) 6125 3379