Hi everyone I have a question about clustering. I've managed using CLARA to get a clustering analysis of a large data set. But now I want to find which is the right number of clusters. The clara.object gives some information like the ratio between maximal and minimal dissimilarity that says (maybe if lower than 1??) if a cluster is well-separated from the other. I've also read something about silhouette and abut cluster.stats but can't manage to get how to find the right number of clusters. I've tried a suggestion from the mailing list but when using dist d1<-dist(mydata$sst) it says that "specified vector size is too big" Is there any method to find the right number of clusters when using clara? Maybe something I've tried but with a small and simple trick I can't find Thanks in advance -- _________________________ El ponent la mou, el llevant la plou Usuari Linux registrat: 363952 ------- Fotos: http://picasaweb.google.es/pacomet [[alternative HTML version deleted]]
Christian Hennig
2008-Sep-30 12:52 UTC
[R] CLARA and determining the right number of clusters
Hi there, generally finding the right number of clusters is a difficult problem and depends heavily on the cluster concept needed for the particular application. No outcome of any automatic mathod should be taken for granted. Having said that, I guess that something like the example given in> ?pam.object(replacing pam by clara) should work with clara, too. Regards, Christian On Tue, 30 Sep 2008, pacomet wrote:> Hi everyone > > I have a question about clustering. I've managed using CLARA to get a > clustering analysis of a large data set. But now I want to find which is the > right number of clusters. > > The clara.object gives some information like the ratio between maximal and > minimal dissimilarity that says (maybe if lower than 1??) if a cluster is > well-separated from the other. I've also read something about silhouette and > abut cluster.stats but can't manage to get how to find the right number of > clusters. > > I've tried a suggestion from the mailing list but when using dist > > d1<-dist(mydata$sst) > > it says that "specified vector size is too big" > > Is there any method to find the right number of clusters when using clara? > Maybe something I've tried but with a small and simple trick I can't find > > Thanks in advance > > -- > _________________________ > El ponent la mou, el llevant la plou > Usuari Linux registrat: 363952 > ------- > Fotos: http://picasaweb.google.es/pacomet > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >*** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche