On Thu, 11 Feb 2010, Christian Hennig wrote:
>It is well know that hierarchical methods are problematic with too large
>dissimilarity matrices; even if you resolve the memory problem, the number
of
>operations required is enormous.
There is at least one exception to this. Single-linkage hierarchical clustering
with a convex distance such as Euclidean distance is feasible for quite large
data sets using algorithms for the Euclidean minimum spanning tree. For tens to
hundreds of thousands of points (flow cytometry data) the algorithm in the
nnclust package is competitive in speed with model-based clustering (on a 32-bit
system). It's slower than pam(), but it is deterministic.
This doesn't apply to the original question, of course.
-thomas