thr3ads.net - R help - [R] kmeans (again) [Jun 2003]

If this information is useful, please help other people find it:
Share via:

Luis Torgo

2003-Jun-05 18:04 UTC

[R] kmeans (again)

Regarding a previous question concerning the kmeans function I've tried the 
same example and I also get a strange result (at least according to what is 
said in the help of the function kmeans). Apparently, the function is 
disregarding the initial cluster centers one gives it. According to the help 
of the function:

 centers: Either the number of clusters or a set of initial cluster
          centers...

Now a small dataset:> data<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2)
If I use rows 3 and 4 as cluster centers and a single iteration of kmeans I 
get:> kmeans(data,data[c(3,4),],1)$cluster
[1] 1 1 1 1 2 2

$centers
   [,1] [,2]
1 0.875 2.25
2 8.000 2.50

$withinss
[1] 32.9375  6.5000

$size
[1] 4 2

If I now use rows 1 and 6 as cluster centers I get exactly the same solution 
after the first iteration:
> kmeans(data,data[c(1,6),],1)$cluster
[1] 1 1 1 1 2 2

$centers
   [,1] [,2]
1 0.875 2.25
2 8.000 2.50

$withinss
[1] 32.9375  6.5000

$size
[1] 4 2

So, apparently the function is disregarding the initial cluster centers 
information. This is even "confirmed" by the fact that if I use the
function
without cluster centers, simply given the number of clusters, I get the same 
solution:> kmeans(data,2,1)$cluster
[1] 2 2 2 2 1 1

$centers
   [,1] [,2]
1 8.000 2.50
2 0.875 2.25

$withinss
[1]  6.5000 32.9375

$size
[1] 2 4



-- 
Luis Torgo
    FEP/LIACC, University of Porto   Phone : (+351) 22 607 88 30
    Machine Learning Group           Fax   : (+351) 22 600 36 54
    R. Campo Alegre, 823             email : ltorgo at liacc.up.pt
    4150 PORTO   -  PORTUGAL         WWW   : http://www.liacc.up.pt/~ltorgo

Liaw, Andy

2003-Jun-06 02:19 UTC

head link

[R] kmeans (again)

Just because you get the same answer from different starting points doesn't
mean the algorithm isn't using the starting points you specified.

I tried:
> set.seed(1)
> x <- matrix(rnorm(12), 6, 2)
> kmeans(x, x[c(1,6),], 1)$cluster
[1] 2 1 2 1 1 2

$centers
        [,1]      [,2]
1  0.7028106 0.6482392
2 -0.7608503 0.4843512

$withinss
[1] 2.86861843 0.04450923

$size
[1] 3 3
> kmeans(x, 2, 1)$cluster
[1] 2 1 2 1 1 2

$centers
        [,1]      [,2]
1  0.7028106 0.6482392
2 -0.7608503 0.4843512

$withinss
[1] 2.86861843 0.04450923

$size
[1] 3 3
> kmeans(x, x[c(3,4),], 1)$cluster
[1] 1 1 1 2 1 1

$centers
        [,1]       [,2]
1 -0.3538799  0.7406319
2  1.5952808 -0.3053884

$withinss
[1] 2.089050 0.000000

$size
[1] 5 1

which shows that the result *can* depend on the starting values.

Andy
> -----Original Message-----
> From: Luis Torgo [mailto:ltorgo at liacc.up.pt]
> Sent: Thursday, June 05, 2003 2:05 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] kmeans (again)
> 
> 
> Regarding a previous question concerning the kmeans function 
> I've tried the 
> same example and I also get a strange result (at least 
> according to what is 
> said in the help of the function kmeans). Apparently, the function is 
> disregarding the initial cluster centers one gives it. 
> According to the help 
> of the function:
> 
>  centers: Either the number of clusters or a set of initial cluster
>           centers...
> 
> Now a small dataset:
> > data<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2)
> 
> If I use rows 3 and 4 as cluster centers and a single 
> iteration of kmeans I 
> get:
> > kmeans(data,data[c(3,4),],1)
> $cluster
> [1] 1 1 1 1 2 2
> 
> $centers
>    [,1] [,2]
> 1 0.875 2.25
> 2 8.000 2.50
> 
> $withinss
> [1] 32.9375  6.5000
> 
> $size
> [1] 4 2
> 
> If I now use rows 1 and 6 as cluster centers I get exactly 
> the same solution 
> after the first iteration:
> 
> > kmeans(data,data[c(1,6),],1)
> $cluster
> [1] 1 1 1 1 2 2
> 
> $centers
>    [,1] [,2]
> 1 0.875 2.25
> 2 8.000 2.50
> 
> $withinss
> [1] 32.9375  6.5000
> 
> $size
> [1] 4 2
> 
> So, apparently the function is disregarding the initial 
> cluster centers 
> information. This is even "confirmed" by the fact that if I 
> use the function 
> without cluster centers, simply given the number of clusters, 
> I get the same 
> solution:
> > kmeans(data,2,1)
> $cluster
> [1] 2 2 2 2 1 1
> 
> $centers
>    [,1] [,2]
> 1 8.000 2.50
> 2 0.875 2.25
> 
> $withinss
> [1]  6.5000 32.9375
> 
> $size
> [1] 2 4
> 
> 
> 
> -- 
> Luis Torgo
>     FEP/LIACC, University of Porto   Phone : (+351) 22 607 88 30
>     Machine Learning Group           Fax   : (+351) 22 600 36 54
>     R. Campo Alegre, 823             email : ltorgo at liacc.up.pt
>     4150 PORTO   -  PORTUGAL         WWW   : 
> http://www.liacc.up.pt/~ltorgo
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, cont... {{dropped}}

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jun 2003 - kmeans (again)

[R] kmeans (again)

[R] kmeans (again)

Possibly Parallel Threads