thr3ads.net - R help - [R] kmeans() compared to PROC FASTCLUS [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Andrew Agrimson

2010-Dec-02 23:49 UTC

[R] kmeans() compared to PROC FASTCLUS

Hello all,

I've been comparing results from kmeans() in R to PROC FASTCLUS in SAS and
I'm getting drastically different results with a real life data set. Even
with a simulated data set starting with the same seeds with very well
seperated clusters the resulting cluster means are still different. I was
hoping to look at the source code of kmeans(), but it's in C and FORTRAN and
I'm not quite sure how to get at it. Has anybody looked into the differences
in the implementations or have any thoughts on the matter? Below is the code
I'm using in each case.


fit=kmeans(obs[,-1],centers,nstart=25)

*

proc* *fastclus* data=std maxclusters=*2* maxiter=*100* outiter drift

converge=*0.01* outseed=centers out=cluster;

var x y z;
*

run*;
Thanks,
Andy

	[[alternative HTML version deleted]]

Georg Ruß

2010-Dec-03 16:15 UTC

head link

[R] kmeans() compared to PROC FASTCLUS

On 02/12/10 17:49:37, Andrew Agrimson wrote:> I've been comparing results from kmeans() in R to PROC FASTCLUS in SAS
> and I'm getting drastically different results with a real life data
set.
> [...] Has anybody looked into the differences in the implementations or
> have any thoughts on the matter?
Hi Andrew,

as per the website below, it looks as if PROC FASTCLUS is implementing a
certain flavor of k-Means:

technion.ac.il/docs/sas/stat/chap27/sect2.htm

As per the manpage ?kmeans, the R implementation of k-Means has the option
to set one of the algorithms explicitly:

algorithm = c("Hartigan-Wong", "Lloyd", "Forgy",
"MacQueen"))

I don't know whether you've tried that, but you may start by setting
these
algorithm variants explicitly and see what the outcome is.

Regards,
Georg.
-- 
Research Assistant
Otto-von-Guericke-Universit?t Magdeburg
research at georgruss.de
research.georgruss.de

Maybe Matching Threads

Search for more possibly parallel threads

R help - Dec 2010 - kmeans() compared to PROC FASTCLUS

[R] kmeans() compared to PROC FASTCLUS

[R] kmeans() compared to PROC FASTCLUS

Maybe Matching Threads