I'm running a cluster analysis with many observations (approx. 7,000) using both continuous and categorical variables. PAM is a theoretically appealing approach however I believe the number of observations makes its use untenable. CLARA, which uses the PAM algorithm seems like the algorithm to use however it requires a numeric data matrix or data frame with rows corresponding to cases and columns to variables. Since a dissimilarity matrix is not legitimate input (to CLARA) and since a data matrix with categorical variables is also inappropriate, it seems that CLARA may only be run on numeric data. If thats true, I'm wondering what the benefit is in using the PAM algorithm (a generalization of K-means which, in part, addresses inclusion of categorical variables). My guess is I'm missing something, any insight would be appreciated. Many thanks, Joe Retzer [[alternative HTML version deleted]]