Skanda Kallur; MEngg
2003-May-07 03:20 UTC
[R] -means, hybrid clustering or similar implementations on R
Hi, I would like to know if someone knows an extended implementation of k-means in R to find appropriate number of clusters for a given k-dimensional data. Also, I am working on clustering for forecasting, if someone is interested or has knowledge on implementational details please mail me, I would appreciate it. Regards Skanda Kallur "Cogito, ergo sum" (I think, therefore I am) - Ren? D?scartes
Christian Hennig
2003-May-07 08:29 UTC
[R] -means, hybrid clustering or similar implementations on R
Hi, On Wed, 7 May 2003, Skanda Kallur; MEngg wrote:> Hi, > > I would like to know if someone knows an extended implementation of k-means in R to find appropriate number of clusters for a given k-dimensional data.You may use pam in library(cluster). Optimal number of clusters by maximizing pam(x, k) $ silinfo $ avg.width over k (number of clusters). Note that this does not work with k=1. pam does not exactly the same as k-means. By default, it uses euclidean distances, not their squares ("k-median") and all cluster centers are present data points (medoids). If you want to "emulate" k-means, you can provide x as a distance matrix with squared euclidean distances (which is often worse than the default, e.g. in case of outliers). An alternative is the use of EMclust in library(mclust), which decides about the optimal number of clusters by Bayesian Information Criterion (BIC). Set the parameter emModelNames="EII" for the mixture model analogon to k-means (but do this only if you are sure that you want something k-means-like and not a more flexible model). In general, the number of clusters-problem is difficult, because is does not only depend on the data but also on your concept of a "cluster". The BIC has a bit better theoretical support than pam's average silhouette width, but the problem is far from being solved. Christian -- *********************************************************************** Christian Hennig Seminar fuer Statistik, ETH-Zentrum (LEO), CH-8092 Zuerich (currently) and Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg hennig at stat.math.ethz.ch, http://stat.ethz.ch/~hennig/ hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/ ####################################################################### ich empfehle www.boag.de