Hi, I am using kmeans to cluster a dataset. I test this example:> data<-matrix(scan("data100.txt"),100,37,byrow=T)(my dataset is 100 rows and 37 columns--clustering rows) > c1<-kmeans(data,3,20)> c1$cluster [1] 1 1 1 1 1 1 1 3 3 3 1 3 1 3 3 1 1 1 1 3 1 3 3 1 1 1 3 3 1 1 3 1 1 1 1 3 3 [38] 3 1 1 1 3 1 1 1 1 3 3 3 1 1 1 1 1 1 3 1 3 1 1 3 1 1 1 1 3 1 1 1 1 1 1 3 1 [75] 1 3 1 3 1 1 1 1 3 1 1 1 1 1 3 1 1 3 1 1 3 3 1 2 1 1 $withinss [1] 1037.5987 0.0000 666.9701 $size [1] 68 1 31> c4<-kmeans(data,3,20)$withinss [1] 0.0000 865.7628 851.1214 $size [1] 1 54 45 Does any one tell me why the results are very different with the same dataset and parameters when I run some times this command 'kmeans(data,3,20)'??? Thank you for your help in advance! ping
On Mon, 14 Apr 2003, pingzhao wrote:> Hi, > > I am using kmeans to cluster a dataset. > I test this example: > > > data<-matrix(scan("data100.txt"),100,37,byrow=T) > (my dataset is 100 rows and 37 columns--clustering rows) > > > c1<-kmeans(data,3,20) > > c1 > $cluster > [1] 1 1 1 1 1 1 1 3 3 3 1 3 1 3 3 1 1 1 1 3 1 3 3 1 1 1 3 3 1 1 3 1 1 1 1 3 > 3 > [38] 3 1 1 1 3 1 1 1 1 3 3 3 1 1 1 1 1 1 3 1 3 1 1 3 1 1 1 1 3 1 1 1 1 1 1 3 > 1 > [75] 1 3 1 3 1 1 1 1 3 1 1 1 1 1 3 1 1 3 1 1 3 3 1 2 1 1 > > $withinss > [1] 1037.5987 0.0000 666.9701 > > $size > [1] 68 1 31 > > > c4<-kmeans(data,3,20) > $withinss > [1] 0.0000 865.7628 851.1214 > > $size > [1] 1 54 45 > > Does any one tell me why the results are very different with the same > dataset and parameters when I run some times this command > 'kmeans(data,3,20)'???The help page could tell you: centers: Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in `x' are chosen as the initial centers. At the very least, the labellings of the clusters are arbitrary, but K-means usually has multiple local minima. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi,
It is expected! If you read:
?kmeans
For the "centers" argument:
"...a random set of rows in `x' are chosen
as the initial centers. "
In other words the starting values are different. In fact one should run
kmeans() several times to avoid local minimum.
If you run it, say, 20 times and you get the same results 15 times, then
you can "probably" be confident to use that solution.
On Mon, 14 Apr 2003, pingzhao wrote:
> Does any one tell me why the results are very different with the same
> dataset and parameters when I run some times this command
> 'kmeans(data,3,20)'???
>
> Thank you for your help in advance!
>
> ping
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
--
Cheers,
Kevin
------------------------------------------------------------------------------
/* Time is the greatest teacher, unfortunately it kills its students */
--
Ko-Kang Kevin Wang
Master of Science (MSc) Student
SLC Tutor and Lab Demonstrator
Department of Statistics
University of Auckland
New Zealand
Homepage: http://www.stat.auckland.ac.nz/~kwan022
Ph: 373-7599
x88475 (City)
x88480 (Tamaki)