Tal Galili
2009-Mar-29 00:09 UTC
[R] [cluster package question] What is the "sum of the dissimilarities" in the pam command ?
Hello Martin Maechler and All, A simple question (I hope): How can I compute the "sum of the dissimilarities" that appears in the pam command (from the cluster package) ? Is it the "manhattan" distance (such as the one implemented by "dist") ? I am asking since I am running clustering on a dataset. I found 7 medoids with the pam command, and from it I have the medoid to which each observation belongs to. But when I check it, I find only (about) 90% of observations has the minimum manhattan distance to the medoids that pam predicted. If this is the manhattan distance that is used, I will create some toy data to see if I can reproduce this. Thanks, Tal -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]]
Martin Maechler
2009-Mar-30 09:31 UTC
[R] [cluster package question] What is the "sum of the dissimilarities" in the pam command ?
>>>>> "TG" == Tal Galili <tal.galili at gmail.com> >>>>> on Sun, 29 Mar 2009 03:09:17 +0300 writes:TG> Hello Martin Maechler and All, TG> A simple question (I hope): TG> How can I compute the "sum of the dissimilarities" that appears in the pam TG> command (from the cluster package) ? TG> Is it the "manhattan" distance (such as the one implemented by "dist") ? well, it first depends if 'x' in pam(x, k, dist, metric, ...) is *itself* a dissimilarity object or not. --> help(daisy) and help(dist) If it is *not* --- which I assume from your question --- then the answer depends on the 'metric' argument of pam(). As you did not mention that, I assume you left 'metric' at its default which is "euclidean", i.e., not "manhattan". TG> I am asking since I am running clustering on a dataset. I found 7 medoids TG> with the pam command, and from it I have the medoid to which each TG> observation belongs to. But when I check it, I find only (about) 90% of TG> observations has the minimum manhattan distance to the medoids that pam TG> predicted. TG> If this is the manhattan distance that is used, I will create some toy data TG> to see if I can reproduce this. Yes, specifying some reproducible toy data and specific R code is almost always useful and typically more productive when asking such questions by e-mail. Regards, Martin Maechler, ETH Zurich TG> Thanks, TG> Tal TG> ---------------------------------------------- TG> My contact information: TG> Tal Galili TG> Phone number: 972-50-3373767 TG> FaceBook: Tal Galili TG> My Blogs: TG> http://www.r-statistics.com/ TG> http://www.talgalili.com TG> http://www.biostatistics.co.il