vincent vicaire
2010-Jun-03 10:32 UTC
[R] rules for optimizing samples in CLARA (size and numbers) ?
Hi, With a 9000 observations dataset, I have noticed a significant variability in the silhouette index when I change the default value for samples (5 default value) and sampsize (40+2*clusters number) in CLARA. Is there somes rules according to the number of cluster and observations to fix samples and sampsize parameters efficiently, so as to avoid under- and oversampling with CLARA in one hand and keeping a good time running in other hand ? I didn't not find any rules of this type on the web (except avoiding biaised samples...). Gratefully yours. vincent [[alternative HTML version deleted]]