Hi there, I notice that some of the clustering methods in R are not appropriate to deal with large data set. Here is the list I make to see which are appropriate or which are not appropriate for large dataset. Could you please take a look and check if it is right or not? I need this information to decide which methods I should choose. Thank you! P.S.: List: Appropriate for large data set: clara: k-mean mclust: fits mixtures of Gaussians using the EM algorithm clue: implements ensemble methods for both hierarchical and partitioning cluster methods. cmeans: Fuzzy clustering bclust: bagged clustering hopach: a hybrid between hierarchical methods and PAM and builds a tree by recursively partitioning a data set. som: Self-organizing maps are available Not appropriate for large data set: (a) Hierarchical clustering: not appropriate for large data set because of the quadratic computational complexities in both execution time and store space. (b) pam: implement partitioning around medoids and can work with arbitrary distances. [[alternative HTML version deleted]]