thr3ads.net - R help - [R] distance in the function kmeans [May 2004]

If this information is useful, please help other people find it:
Share via:

n.bouget

2004-May-28 07:37 UTC

[R] distance in the function kmeans

Hi,
I want to know which distance is using in the function kmeans
and if we can change this distance. 
Indeed, in the function pam, we can put a distance matrix in
parameter (by the line "pam<-pam(dist(matrixdata),k=7)" ) but
we can't do it in the function kmeans, we have to put the
matrix of data directly ...
Thanks in advance,
Nicolas BOUGET

Uwe Ligges

2004-May-28 08:12 UTC

head link

[R] distance in the function kmeans

n.bouget wrote:
> Hi,
> I want to know which distance is using in the function kmeans
> and if we can change this distance. 
> Indeed, in the function pam, we can put a distance matrix in
> parameter (by the line "pam<-pam(dist(matrixdata),k=7)" ) but
> we can't do it in the function kmeans, we have to put the
> matrix of data directly ...
> Thanks in advance,
> Nicolas BOUGET
As the name says, kmeans() calculates *means* (centres) of clusters. It 
does not any make sense to do that on distances ...

Uwe Ligges

> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

n.bouget@laposte.net

2004-May-28 08:30 UTC

head link

[R] distance in the function kmeans

> n.bouget wrote:
> 
> > Hi,
> > I want to know which distance is using in the function kmeans
> > and if we can change this distance. 
> > Indeed, in the function pam, we can put a distance matrix in
> > parameter (by the line "pam<-pam(dist(matrixdata),k=7)" )
but
> > we can't do it in the function kmeans, we have to put the
> > matrix of data directly ...Yes but how can we choose the distance to calculate centers?
> > Thanks in advance,
> > Nicolas BOUGET
> 
> As the name says, kmeans() calculates *means* (centres) of
clusters. It > does not any make sense to do that on distances ...
> 
> Uwe Ligges
> 
> 
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.
math.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html> 
>

Martin Maechler

2004-May-28 09:39 UTC

head link

[R] distance in the function kmeans

>>>>> "n\" == n\ bouget <n>
>>>>>     on Fri, 28 May 2004 09:37:35 +0200 writes:
    n\> Hi, I want to know which distance is using in the
    n\> function kmeans and if we can change this distance.
    n\> Indeed, in the function pam, we can put a distance
    n\> matrix in parameter (by the line
    n\> "pam<-pam(dist(matrixdata),k=7)" ) but we can't do
it in
    n\> the function kmeans, we have to put the matrix of data
    n\> directly ...  Thanks in advance, Nicolas BOUGET

It might be interesting to look at this from the pam()
perspective:
What exactly is pam() lacking that kmeans() does for you?

Christian, are you suggesting that pam() could do the job if

1) there was a dist(., method="a la kmeans") 
2) pam() allowed to be started by a user-specified set of
	 medoids instead of the "Kaufman-Rousseeuw-optimal" ones
?

Regards,
Martin Maechler

Thomas Petzoldt

2004-May-28 10:41 UTC

head link

[R] distance in the function kmeans

n.bouget wrote:> Hi,
> I want to know which distance is using in the function kmeans
> and if we can change this distance. 
> Indeed, in the function pam, we can put a distance matrix in
> parameter (by the line "pam<-pam(dist(matrixdata),k=7)" ) but
> we can't do it in the function kmeans, we have to put the
> matrix of data directly ...
> Thanks in advance,
> Nicolas BOUGET
One solution is to transform the data in a way, that the euclidean 
distance of the transformed values represents some other distance of the 
original values. This works at least for the Mahalanobis-Distance, when 
one applies a multivariate technique to a PCA transformed and re-scaled 
matrix, but I don't know if there are transformations for some other 
distance measures.

Thomas P.

n.bouget@laposte.net

2004-May-28 10:54 UTC

head link

[R] distance in the function kmeans

I don't exactly understand what you do, could you show me the
program that you execute to do that?
> n.bouget wrote:
> > Hi,
> > I want to know which distance is using in the function kmeans
> > and if we can change this distance. 
> > Indeed, in the function pam, we can put a distance matrix in
> > parameter (by the line "pam<-pam(dist(matrixdata),k=7)" )
but
> > we can't do it in the function kmeans, we have to put the
> > matrix of data directly ...
> > Thanks in advance,
> > Nicolas BOUGET
> 
> One solution is to transform the data in a way, that the
euclidean > distance of the transformed values represents some other
distance of the > original values. This works at least for the
Mahalanobis-Distance, when > one applies a multivariate technique to a PCA transformed
and re-scaled > matrix, but I don't know if there are transformations for
some other > distance measures.
> 
> Thomas P.
>

Jari Oksanen

2004-May-29 05:53 UTC

head link

[R] distance in the function kmeans

My thread broke as I write this at home and there were no new messages 
on this subject after I got  home. I hope this still reaches interested 
parties.

There are several methods that find centroids (means) from distance 
data. Centroid clustering methods do so, and so does classic scaling 
a.k.a. metric multidimensional scaling a.k.a. principal co-ordinates 
analysis (in R function cmdscale the means are found in C function 
dblcen.c in R sources). Strictly this centroid finding only works with 
Euclidean distances, but these methods willingly handle any other 
dissimilarities (or distances). Sometimes this results in anomalies 
like upper levels being below lower levels in cluster diagrams or in 
negative eigenvalues in cmdscale. In principle, kmeans could do the 
same if she only wanted.

Is it correct to use non-Euclidean dissimilarities when Euclidean 
distances were assumed? In my field (ecology) we know that Euclidean 
distances are often poor, and some other dissimilarities have better 
properties, and I think it is OK to break the rules (or `violate the 
assumptions'). Now we don't know what kind of dissimilarities were used 
in the original post (I think I never saw this specified), so we don't 
know if they can be euclidized directly using ideas of Petzold or 
Simpson. They might be semimetric or other sinful dissimilarities, too. 
These would be bad in the sense Uwe Ligges wrote: you wouldn't get 
centres of Voronoi polygons in original space, not even non-overlapping 
polygons. Still they might work better than the original space (who 
wants to be in the original space when there are better spaces floating 
around?)

The following trick handles the problem euclidizing space implied by 
any dissimilarity meaasure (metric or semimetric). Here mdata is your 
original (rectangular) data matrix, and dis is any dissimilarity data:

tmp <- cmdscale(dis, k=min(dim(mdata))-1, eig=TRUE)
eucspace <- tmp$points[, tmp$eig > 0.01]

The condition removes axes with negative or almost-zero eigenvalues 
that you will get with semimetric dissimilarities.

Then just call kmeans with eucspace as argument. If your dis is 
Euclidean, this is only a rotation and kmeans of eucspace and mdata 
should be equal. For other types of dis (even for semimetric 
dissimilarity) this maps your dissimilarities onto Euclidean space 
which in effect is the same as performing kmeans with your original 
dissimilarity.

Cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland

Maybe Matching Threads

Search for more maybe matching threads

R help - May 2004 - distance in the function kmeans

[R] distance in the function kmeans

[R] distance in the function kmeans

[R] distance in the function kmeans

[R] distance in the function kmeans

[R] distance in the function kmeans

[R] distance in the function kmeans

[R] distance in the function kmeans

Maybe Matching Threads