Hi the devel list, I am using K means with a non standard distance. As far as I see, the function kmeans is able to deal with 4 differents algorithm, but not with a user define distance. In addition, kmeans is not able to deal with missing value whereas there is several solution that k-means can use to deal with them ; one is using a distance that takes the missing value in account, like a distance with Gower adjustement (which is the regular distance dist() used in R). So is it possible to adapt kmeans to let the user gives an argument 'distance to use'? Christophe
I would not support an extension of kmeans to do this. I think it is best left simple and fast as it now is. I can think of three ways you might handle your problem 1. Use, for example, pam() in the cluster package, which does a similar job to kmeans (not quite the same, of course) with a general distance measure. 2. If you are working with a non-standard metric and you really want to use the k-means algorithm, then perhaps one way to do so is to use an approximate euclidean coordinatisatin for the points with a multidimensional scaling first and then use kmeans. (e.g. cmdscale, isoMDS, sammon, ...) I've no idea what the traps are with this approach, but it seems kind of feasible. 3. If the algorithms are there and available as you say, write the code yourself and contribute it to the R-project as a simple package. Everyone will benefit. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of cgenolin at u-paris10.fr Sent: Tuesday, 13 May 2008 3:25 AM To: r-devel at r-project.org Subject: [Rd] k means Hi the devel list, I am using K means with a non standard distance. As far as I see, the function kmeans is able to deal with 4 differents algorithm, but not with a user define distance. In addition, kmeans is not able to deal with missing value whereas there is several solution that k-means can use to deal with them ; one is using a distance that takes the missing value in account, like a distance with Gower adjustement (which is the regular distance dist() used in R). So is it possible to adapt kmeans to let the user gives an argument 'distance to use'? Christophe ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> On Mon, 12 May 2008 19:24:55 +0200, >>>>> cgenolin (c) wrote:> Hi the devel list, > I am using K means with a non standard distance. As far as I see, the > function kmeans is able to deal with 4 differents algorithm, but not > with a user define distance. > In addition, kmeans is not able to deal with missing value whereas > there is several solution that k-means can use to deal with them ; one > is using a distance that takes the missing value in account, like a > distance with Gower adjustement (which is the regular distance dist() > used in R). > So is it possible to adapt kmeans to let the user gives an argument > 'distance to use'? As Bill Venables already pointed out that makes not too much sense, especially as there are already R functions for doing that. Package flexclust implements a k-means-type clustering algorithm where the user can provide arbitrary distance measures, have a look at http://www.stat.uni-muenchen.de/~leisch/papers/Leisch-2006.pdf The code you need to write for using a new distance measure is minimal, and there are two examples in the paper describing in detail what needs to be done. Hope this helps, Fritz Leisch -- ----------------------------------------------------------------------- Prof. Dr. Friedrich Leisch Institut f?r Statistik Tel: (+49 89) 2180 3165 Ludwig-Maximilians-Universit?t Fax: (+49 89) 2180 5308 Ludwigstra?e 33 D-80539 M?nchen http://www.statistik.lmu.de/~leisch ----------------------------------------------------------------------- Journal Computational Statistics --- http://www.springer.com/180 M?nchner R Kurse --- http://www.statistik.lmu.de/R