Hi, I am using kmeans to cluster a dataset. I test this example:> data<-matrix(scan("data100.txt"),100,37,byrow=T)(my dataset is 100 rows and 37 columns--clustering rows) > c1<-kmeans(data,3,20)> c1$cluster [1] 1 1 1 1 1 1 1 3 3 3 1 3 1 3 3 1 1 1 1 3 1 3 3 1 1 1 3 3 1 1 3 1 1 1 1 3 3 [38] 3 1 1 1 3 1 1 1 1 3 3 3 1 1 1 1 1 1 3 1 3 1 1 3 1 1 1 1 3 1 1 1 1 1 1 3 1 [75] 1 3 1 3 1 1 1 1 3 1 1 1 1 1 3 1 1 3 1 1 3 3 1 2 1 1 $withinss [1] 1037.5987 0.0000 666.9701 $size [1] 68 1 31> c4<-kmeans(data,3,20)$withinss [1] 0.0000 865.7628 851.1214 $size [1] 1 54 45 Does any one tell me why the results are very different with the same dataset and parameters when I run some times this command 'kmeans(data,3,20)'??? Thank you for your help in advance! ping
On Mon, 14 Apr 2003, pingzhao wrote:> Hi, > > I am using kmeans to cluster a dataset. > I test this example: > > > data<-matrix(scan("data100.txt"),100,37,byrow=T) > (my dataset is 100 rows and 37 columns--clustering rows) > > > c1<-kmeans(data,3,20) > > c1 > $cluster > [1] 1 1 1 1 1 1 1 3 3 3 1 3 1 3 3 1 1 1 1 3 1 3 3 1 1 1 3 3 1 1 3 1 1 1 1 3 > 3 > [38] 3 1 1 1 3 1 1 1 1 3 3 3 1 1 1 1 1 1 3 1 3 1 1 3 1 1 1 1 3 1 1 1 1 1 1 3 > 1 > [75] 1 3 1 3 1 1 1 1 3 1 1 1 1 1 3 1 1 3 1 1 3 3 1 2 1 1 > > $withinss > [1] 1037.5987 0.0000 666.9701 > > $size > [1] 68 1 31 > > > c4<-kmeans(data,3,20) > $withinss > [1] 0.0000 865.7628 851.1214 > > $size > [1] 1 54 45 > > Does any one tell me why the results are very different with the same > dataset and parameters when I run some times this command > 'kmeans(data,3,20)'???The help page could tell you: centers: Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in `x' are chosen as the initial centers. At the very least, the labellings of the clusters are arbitrary, but K-means usually has multiple local minima. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi, It is expected! If you read: ?kmeans For the "centers" argument: "...a random set of rows in `x' are chosen as the initial centers. " In other words the starting values are different. In fact one should run kmeans() several times to avoid local minimum. If you run it, say, 20 times and you get the same results 15 times, then you can "probably" be confident to use that solution. On Mon, 14 Apr 2003, pingzhao wrote:> Does any one tell me why the results are very different with the same > dataset and parameters when I run some times this command > 'kmeans(data,3,20)'??? > > Thank you for your help in advance! > > ping > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >-- Cheers, Kevin ------------------------------------------------------------------------------ /* Time is the greatest teacher, unfortunately it kills its students */ -- Ko-Kang Kevin Wang Master of Science (MSc) Student SLC Tutor and Lab Demonstrator Department of Statistics University of Auckland New Zealand Homepage: http://www.stat.auckland.ac.nz/~kwan022 Ph: 373-7599 x88475 (City) x88480 (Tamaki)