thr3ads.net - R help - [R] (no subject) [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Linda Lei

2006-Mar-17 00:03 UTC

[R] (no subject)

Hi there,

 

I notice that some of the clustering methods in R are not appropriate to
deal with large data set. Here is the list I make to see which are
appropriate or which are not appropriate for large dataset. Could you
please take a look and check if it is right or not? I need this
information to decide which methods I should choose.

 

Thank you!

 

P.S.:   List:

 

Appropriate for large data set:

 

clara: k-mean

 

mclust: fits mixtures of Gaussians using the EM algorithm

 

clue: implements ensemble methods for both hierarchical and partitioning
cluster   

         methods.

 

cmeans: Fuzzy clustering

 

bclust: bagged clustering

 

hopach: a hybrid between hierarchical methods and PAM and builds a tree
by recursively   

              partitioning a data set.

 

som: Self-organizing maps are available

 

 

Not appropriate for large data set:

 

(a)    Hierarchical clustering: not appropriate for large data set
because of the quadratic computational complexities in both execution
time and store space.

 

(b)   pam: implement partitioning around medoids and can work with
arbitrary  

               distances.

 

 


	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more apparently analagous threads

R help - Mar 2006 - (no subject)

[R] (no subject)

Maybe Matching Threads

Wisdom of the Ancients