thr3ads.net - R help - [R] k- means cluster analysis [Feb 2003]

If this information is useful, please help other people find it:
Share via:

Ngayee J Law

2003-Feb-13 02:13 UTC

[R] k- means cluster analysis

Hi all,

I am trying to run the k-means cluster analysis using the function kmeans
in the package cluster.
The data are:
x = c(-0.26, -0.23, -0.05, -0.20,  0.30, -0.84, -0.10, -0.12,  0.10, -0.31,
-0.19,  0.18, -0.26,
      -0.23, -0.37, -0.23)
I've got two different solutions when I ran this function over a few times:
kmeans(x, centers=2)

The first solution gives the following:
$cluster
 [1] 2 2 1 2 1 2 2 2 1 2 2 1 2 2 2 2
$centers
        [,1]
1  0.1325000
2 -0.2783333
$withinss
[1] 0.0646750 0.4033667
$size
[1]  4 12

The second solution gives the following:
$cluster
 [1] 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
$centers
        [,1]
1 -0.1313333
2 -0.8400000
$withinss
[1] 0.5035733 0.0000000
$size
[1] 15  1

I don't understand why this is happening, and how do I choose between the
two solutions. Also, how can I ensure
consistent solution over times? Thanks a lot!

- Jacqueline

Sundar Dorai-Raj

2003-Feb-13 03:46 UTC

head link

[R] k- means cluster analysis

Ngayee J Law wrote:> Hi all,
> 
> I am trying to run the k-means cluster analysis using the function kmeans
> in the package cluster.
> The data are:
> x = c(-0.26, -0.23, -0.05, -0.20,  0.30, -0.84, -0.10, -0.12,  0.10, -0.31,
> -0.19,  0.18, -0.26,
>       -0.23, -0.37, -0.23)
> I've got two different solutions when I ran this function over a few
times:
> kmeans(x, centers=2)
> 
> The first solution gives the following:
> $cluster
>  [1] 2 2 1 2 1 2 2 2 1 2 2 1 2 2 2 2
> $centers
>         [,1]
> 1  0.1325000
> 2 -0.2783333
> $withinss
> [1] 0.0646750 0.4033667
> $size
> [1]  4 12
> 
> The second solution gives the following:
> $cluster
>  [1] 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
> $centers
>         [,1]
> 1 -0.1313333
> 2 -0.8400000
> $withinss
> [1] 0.5035733 0.0000000
> $size
> [1] 15  1
> 
> I don't understand why this is happening, and how do I choose between
the
> two solutions. Also, how can I ensure
> consistent solution over times? Thanks a lot!
> 
> - Jacqueline
> 
 From the help page for `kmeans':

  centers: Either the number of clusters or a set of initial cluster
           centers. If the first, a random set of rows in `x' are chosen
           as the initial centers.

If you want the same results try supplying an initial center, as in:

kmeans(x, centers = c(0.1, -0.2))

However, choosing bad starting values could cause kmeans to crash, as in:

kmeans(x, centers = c(0, 0))

Regards,
Sundar

Reasonably Related Threads

Search for more reasonably related threads

R help - Feb 2003 - k- means cluster analysis

[R] k- means cluster analysis

[R] k- means cluster analysis

Reasonably Related Threads