I don't know how the hclust function is implemented, but generally in
hierarchical clustering the result can be ambiguous if there are several
distances of identical value in the dataset (or identical between-cluster
distances occur when aggregating clusters). The role of the order of the
data depends on how these ambiguities are resolved. It may well be that in
such cases if at some point when building the hierarchy there are two
different possibilities to merge clusters at the same distance value what
is done by hclust is determined by the order.
Hope this helps,
Christian
On Mon, 15 Nov 2010, rchowdhury wrote:
>
> Hello,
>
> I am using the hclust function to cluster some data. I have two separate
> files with the same data. The only difference is the order of the data in
> the file. For some reason, when I run the two files through the hclust
> function, I get two completely different results.
>
> Does anyone know why this is happening? Does the order of the data matter?
>
> Thanks,
> RC
> --
> View this message in context:
http://r.789695.n4.nabble.com/hclust-does-order-of-data-matter-tp3043896p3043896.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche