thr3ads.net - R help - [R] k-means: should columns in dataset be in same scale? [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Johan Jackson

2008-Apr-23 00:26 UTC

[R] k-means: should columns in dataset be in same scale?

Hi all,

Simple question re k-means. If I have a data set with columns that are on
different scales (say col 1 has var=100 and col2 var=2), will this make a
difference to the k-means algorithm? It seems as though it does. If so,
should we first standardize the columns of the dataset so that each column
is given equal weight?

JJ

	[[alternative HTML version deleted]]

Prof Brian Ripley

2008-Apr-23 05:46 UTC

head link

[R] k-means: should columns in dataset be in same scale?

k-means uses Euclidean distance, so scaling of the variables does matter.
Whether you want to standardize depends on the example (as it does in most 
multivariate analysis problems, e.g. PCA has the same issues).

On Tue, 22 Apr 2008, Johan Jackson wrote:
> Hi all,
>
> Simple question re k-means. If I have a data set with columns that are on
> different scales (say col 1 has var=100 and col2 var=2), will this make a
> difference to the k-means algorithm? It seems as though it does. If so,
> should we first standardize the columns of the dataset so that each column
> is given equal weight?
>
> JJ
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Apr 2008 - k-means: should columns in dataset be in same scale?

[R] k-means: should columns in dataset be in same scale?

[R] k-means: should columns in dataset be in same scale?

Seemingly Similar Threads