Dear helpers I was working with kmeans from package mva and found some strange situations. When I run several times the kmeans algorithm with the same dataset I get the same partition. I simulated a little example with 6 observations and run kmeans giving the centers and making just one iteration. I expected that the algorithm just allocated the observations to the nearest center but think this is not the result that I get... Here are the simulated data> dados<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2) > dados[,1] [,2] [1,] -1.0 0 [2,] 0.0 3 [3,] 2.0 0 [4,] 2.5 6 [5,] 7.0 1 [6,] 9.0 4> plot(dados) > dados<-matrix(c(-1,0,2,2.5,7,9,0,5,0,6,1,4),6,2) > plot(dados) > A<-kmeans(dados,dados[c(3,4),],1) > A$cluster [1] 1 1 1 1 2 2 $centers [,1] [,2] 1 0.875 2.75 2 8.000 2.50 $withinss [1] 38.9375 6.5000 $size [1] 4 2 Any hints? Thanks a lot Luis Silva
On Tue, 3 Jun 2003, Luis Miguel Almeida da Silva wrote:> I was working with kmeans from package mva and found some strange > situations. When I run several times the kmeans algorithm with the same > dataset I get the same partition.Why does that surprise you?> I simulated a little example with 6 > observations and run kmeans giving the centers and making just one > iteration. I expected that the algorithm just allocated the observations > to the nearest center but think this is not the result that I get...That's not what the documentation says it does: The data given by `x' is clustered by the k-means algorithm. When this terminates, all cluster centres are at the mean of their Voronoi sets (the set of data points which are nearest to the cluster centre). which is true in your example. It has run one iteration of re-allocation; as you can see by reading the source code or the reference. [...] -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595