thr3ads.net - R help - [R] cluster/distance large matrix (fwd) [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Thomas Lumley

2010-Feb-11 15:13 UTC

[R] cluster/distance large matrix (fwd)

On Thu, 11 Feb 2010, Christian Hennig wrote:
>It is well know that hierarchical methods are problematic with too large 
>dissimilarity matrices; even if you resolve the memory problem, the number
of
>operations required is enormous.

There is at least one exception to this. Single-linkage hierarchical clustering
with a convex distance such as Euclidean distance is feasible for quite large
data sets using algorithms for the Euclidean minimum spanning tree. For tens to
hundreds of thousands of points (flow cytometry data) the algorithm in the
nnclust package is competitive in speed with model-based clustering (on a 32-bit
system).  It's slower than pam(), but it is deterministic.

This doesn't apply to the original question, of course.

     -thomas

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Feb 2010 - cluster/distance large matrix (fwd)

[R] cluster/distance large matrix (fwd)

Possibly Parallel Threads

Wisdom of the Ancients