thr3ads.net - R help - [R] Cluster Package - Clara w/ categorical variables [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Joseph Retzer

2007-Dec-18 03:52 UTC

[R] Cluster Package - Clara w/ categorical variables

I'm running a cluster analysis with many observations (approx. 7,000) using
both continuous and categorical variables. PAM is a theoretically appealing
approach however I believe the number of observations makes its use untenable.
CLARA, which uses the PAM algorithm seems like the algorithm to use however it
requires a numeric data matrix or data frame with rows corresponding to cases
and columns to variables.

Since a dissimilarity matrix is not legitimate input (to CLARA) and since a data
matrix with categorical variables is also inappropriate, it seems that CLARA may
only be run on numeric data. If thats true, I'm wondering what the benefit
is in using the PAM algorithm (a generalization of K-means which, in part,
addresses inclusion of categorical variables).

My guess is I'm missing something, any insight would be appreciated.

Many thanks,
Joe Retzer


	[[alternative HTML version deleted]]

R help - Dec 2007 - Cluster Package - Clara w/ categorical variables

[R] Cluster Package - Clara w/ categorical variables