I'm doing kmeans partitioning on a small (n=26) dataset that has 5
variables. I noticed that if I repeatedly run the same command, the
cluster centers change and the cluster membership changes.
Using RW1022 under Windows NT & Windows 2000
>kmeans(pottery[,1:5], 4, 20)
[...snip]
$size
[1] 7 3 9 7
[...snip]
$size
[1] 7 10 4 5
[...snip]
$size
[1] 6 10 5 5
yields a different answer every time a run it. Sometimes the answer is
different only in the order of withinss (and the ordering of the numbers of
cases assigned to each group). Other times there are completely different
centers, withinss and completely different cluster configurations. This
variability doesn't happen in either S-Plus 2000 or S-Plus 6.0 (Beta 2).
I can see from the help that the R kmeans() function chooses a random set
of rows as cluster centers if the initial centers aren't specified, while
S-Plus uses hclust() and cutree() to determine the initial clusters.
Is there any way to "make" kmeans results persist under repeated uses
of
the same command?
Thanks,
====================Dr. Marc R. Feldesman
Professor and Chairman
Anthropology Department
Portland State University
1721 SW Broadway
Portland, Oregon 97201
email: feldesmanm at pdx.edu
phone: 503-725-3081
fax: 503-725-3905
http://web.pdx.edu/~h1mf
PGP Key Available On Request
=====================
"Anyway, no drug, not even alcohol, causes the fundamental ills of society.
If we're looking for the source of our troubles, we shouldn't test
people
for drugs, we should test them for stupidity, ignorance, greed and love of
power." P.J. O'Rourke
Powered by Optiplochoerus and Windows 2000 (scary isn't it?)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._