Hello all, I've been comparing results from kmeans() in R to PROC FASTCLUS in SAS and I'm getting drastically different results with a real life data set. Even with a simulated data set starting with the same seeds with very well seperated clusters the resulting cluster means are still different. I was hoping to look at the source code of kmeans(), but it's in C and FORTRAN and I'm not quite sure how to get at it. Has anybody looked into the differences in the implementations or have any thoughts on the matter? Below is the code I'm using in each case. fit=kmeans(obs[,-1],centers,nstart=25) * proc* *fastclus* data=std maxclusters=*2* maxiter=*100* outiter drift converge=*0.01* outseed=centers out=cluster; var x y z; * run*; Thanks, Andy [[alternative HTML version deleted]]
On 02/12/10 17:49:37, Andrew Agrimson wrote:> I've been comparing results from kmeans() in R to PROC FASTCLUS in SAS > and I'm getting drastically different results with a real life data set. > [...] Has anybody looked into the differences in the implementations or > have any thoughts on the matter?Hi Andrew, as per the website below, it looks as if PROC FASTCLUS is implementing a certain flavor of k-Means: technion.ac.il/docs/sas/stat/chap27/sect2.htm As per the manpage ?kmeans, the R implementation of k-Means has the option to set one of the algorithms explicitly: algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen")) I don't know whether you've tried that, but you may start by setting these algorithm variants explicitly and see what the outcome is. Regards, Georg. -- Research Assistant Otto-von-Guericke-Universit?t Magdeburg research at georgruss.de research.georgruss.de