andrew mcsweeny
2006-Apr-15 22:25 UTC
[R] clustering genes / automatically determining # of clusters
Hi: I'm clustering a microarray dataset with a large # of samples. I would like your opinion on the best way to automatically determine the optimal # of clusters. Currently I am using the "cluster" package, clustering with "clara", examining the average silhouette width at various numbers of clusters. I'd like opinions on whether any newer packages offer better determination of optimal # of clusters, considering the algorithms in "cluster" were developed decades ago. By the way, I have alot of missing values in my dataset, coded as "NA", so some software packages don't work. Here is the code I've been using: library(cluster) avgsil <- c() for (k in kseq){ clarares <- clara(data, k, rngR = TRUE) savg <- clarares$silinfo$avg.width print(c(k,savg)) avgsil[k] <- savg } k<-kseq plot(k,avgsil[k]) lines(k,avgsil[k]) Sincerely, Andrew McSweeny grad student Medical University of Ohio [[alternative HTML version deleted]]