Hello, I'm playing around with cluster analysis, and am looking for methods to select the number of clusters. I am aware of methods based on a 'pseudo F' or a 'pseudo T^2'. Are there packages in R that will generate these statistics, and/or other statistics to aid in cluster number selection? Thanks, John. -- ==========================================================================Dr. John Janmaat Tel: 902-585-1461 Department of Economics Fax: 902-585-1070 Acadia University Email: jjanmaat at acadiau.ca Wolfville, Nova Scotia, Canada. Web: ace.acadiau.ca/~jjanmaat/
Have you checked the amap package? It has been updated just recently and if I am not wrong there is a method which indicates the best number of k groups for your data. Best wishes, P. Olsson 2006/2/5, John Janmaat <john.janmaat@acadiau.ca>:> > Hello, > > I'm playing around with cluster analysis, and am looking for methods to > select the number of clusters. I am aware of methods based on a 'pseudo > F' or a 'pseudo T^2'. Are there packages in R that will generate these > statistics, and/or other statistics to aid in cluster number selection? > > Thanks, > > John. > -- > > ==========================================================================> Dr. John Janmaat Tel: 902-585-1461 > Department of Economics Fax: 902-585-1070 > Acadia University Email: jjanmaat@acadiau.ca > Wolfville, Nova Scotia, Canada. Web: ace.acadiau.ca/~jjanmaat/ > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >[[alternative HTML version deleted]]
Hi, as said before, some statistics to estimate the number of clusters are in the cluster.stats function of package fpc. These are distance-based, not "pseudo F or T^2". They are documented in the book of Gordon (1999) Classification (see ?cluster.stats for more references). It also includes the average silhouette width of Kaufman and Rousseeuw (1990) (exact reference in ?plot.agnes), which is also part of the output of some functions in package cluster (pam, agnes,...?). An alternative way to estimate the number of clusters is the use of the BIC together with a (normal) mixture model, see package mclust. Best, Christian On Sun, 5 Feb 2006, John Janmaat wrote:> Hello, > > I'm playing around with cluster analysis, and am looking for methods to > select the number of clusters. I am aware of methods based on a 'pseudo > F' or a 'pseudo T^2'. Are there packages in R that will generate these > statistics, and/or other statistics to aid in cluster number selection? > > Thanks, > > John. > -- > ==========================================================================> Dr. John Janmaat Tel: 902-585-1461 > Department of Economics Fax: 902-585-1070 > Acadia University Email: jjanmaat at acadiau.ca > Wolfville, Nova Scotia, Canada. Web: ace.acadiau.ca/~jjanmaat/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >*** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
Dear John,
You can play around with cluster.stats function in library fpc, e.g. you
can try:
library(fpc)
library(cluster)
data(xclara)
dM <- dist(xclara)
cl <- vector()
for(i in 2:7){
  cl[i] <- cluster.stats(d=dM, clustering=clara(d,i)$cluster,
silhouette=FALSE)$wb.ratio
}
plot(1:6,cl[2:7], xaxt="n")
axis(1, at=1:6, labels=2:7)
(..takes some minutes time)
indicates that 3 clusters are "optimal" for this data.
Best,
Matthias
> 
> Hello,
> 
> I'm playing around with cluster analysis, and am looking for 
> methods to 
> select the number of clusters.  I am aware of methods based 
> on a 'pseudo 
> F' or a 'pseudo T^2'.  Are there packages in R that will 
> generate these 
> statistics, and/or other statistics to aid in cluster number 
> selection?
> 
> Thanks,
> 
> John.
> -- 
> =============================================================>
============> Dr. John Janmaat                       Tel: 902-585-1461
> Department of Economics                Fax: 902-585-1070
> Acadia University                      Email: jjanmaat at acadiau.ca
> Wolfville, Nova Scotia, Canada.        Web: ace.acadiau.ca/~jjanmaat/
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read 
> the posting guide! http://www.R-project.org/posting-guide.html
>