Hello all. I wish to run k-means with "manhattan" distance. Since this is not supported by the function "kmeans", I turned to the "pam" function in the "fpc" package. Yet, when I tried to have the algorithm run with different starting points, I found that pam ignores and keep on starting the algorithm from the same starting-points (medoids). For my questions: 1) is there a bug in the code or in the way I am using it ? 2) is there a way to either fix the code or to another function in some package that can run kmeans with manhattan distance (manhattan distances are the sum of absolute differences) ? here is a sample code: require(fpc) x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)), cbind(rnorm(15,5,0.5), rnorm(15,5,0.5))) pam(x, 2, medoids = c(1,16)) output: Medoids: ID [1,] 3 -0.1406026 0.1131493 [2,] 17 4.9564839 4.6480520 ... So the initial medeoids where 3 and 17, not 1 and 16 as I asked. Thanks, Tal -- ---------------------------------------------- Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: www.talgalili.com www.biostatistics.co.il [[alternative HTML version deleted]]
Christian Hennig
2008-Dec-17 11:25 UTC
[R] bug (?!) in "pam()" clustering from fpc package ?
Dear Tal, pam is not in the fpc package but in the cluster package. Look at ?pam and ?pam.object to find out what it does. As far as I see, the medoids in the output object are the final cluster medoids, not the initial ones, which presumably explains the observed behaviour. Best regards, Christian On Wed, 17 Dec 2008, Tal Galili wrote:> Hello all. > I wish to run k-means with "manhattan" distance. > Since this is not supported by the function "kmeans", I turned to the "pam" > function in the "fpc" package. > Yet, when I tried to have the algorithm run with different starting points, > I found that pam ignores and keep on starting the algorithm from the same > starting-points (medoids). > > For my questions: > 1) is there a bug in the code or in the way I am using it ? > 2) is there a way to either fix the code or to another function in some > package that can run kmeans with manhattan distance (manhattan distances are > the sum of absolute differences) ? > > here is a sample code: > require(fpc) > x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)), > cbind(rnorm(15,5,0.5), rnorm(15,5,0.5))) > pam(x, 2, medoids = c(1,16)) > > > output: > Medoids: > ID > [1,] 3 -0.1406026 0.1131493 > [2,] 17 4.9564839 4.6480520 > ... > > So the initial medeoids where 3 and 17, not 1 and 16 as I asked. > > > > Thanks, > Tal > > > > -- > ---------------------------------------------- > Tal Galili > Phone number: 972-50-3373767 > FaceBook: Tal Galili > My Blogs: > www.talgalili.com > www.biostatistics.co.il >*** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche