thr3ads.net - R help - [R] clustering on scaled dataset or not? [Oct 2010]

If this information is useful, please help other people find it:
Share via:

array chip

2010-Oct-28 20:49 UTC

[R] clustering on scaled dataset or not?

Hi, just a general question: when we do hierarchical clustering, should we 
compute the dissimilarity matrix based on scaled dataset or non-scaled dataset? 
daisy() in cluster package allow standardizing the variables before calculating 
dissimilarity matrix; but dist() doesn't have that option at all. Appreciate
if
you can share your thoughts?

Thanks

John



      
	[[alternative HTML version deleted]]

Claudia Beleites

2010-Oct-28 21:23 UTC

head link

[R] clustering on scaled dataset or not?

John,

> Hi, just a general question: when we do hierarchical clustering, should we
> compute the dissimilarity matrix based on scaled dataset or non-scaled
dataset?
> daisy() in cluster package allow standardizing the variables before
calculating
> dissimilarity matrix;
I'd say that should depend on your data.

- if your data is all (physically) different kinds of things (and thus 
different orders of magnitude), then you should probably scale.

- On the other hand, I cluster spectra. Thus my variates are all the 
same unit, and moreover I'd be afraid that scaling would blow up 
noise-only variates (i.e. the spectra do have low or no intensity 
regions), thus I usually don't scale.

- It also depends on your distance. E.g. Mahalanobis should do the 
scaling by itself, if think correctly at this time of the day...

What I do frequently, though, is subtracting something like the minimum 
spectrum (in practice, I calculate the 5th percentile for each variate - 
it's less noisy). You can also center, but I'm strongly for having a 
physical meaning, and for my samples that's the minimum spectrum is 
better interpretable (it represents the matrix composition).
> but dist() doesn't have that option at all. Appreciate if
> you can share your thoughts?but you could call scale () and then dist ().

Claudia

>
> Thanks
>
> John
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Oct 2010 - clustering on scaled dataset or not?

[R] clustering on scaled dataset or not?

[R] clustering on scaled dataset or not?

Apparently Analagous Threads