Dear R Users,
I am doing clustering and just wondering
(1) whether is it possible to find optimum number of clusters using kmeans
just like PAM using silhouette width.
asw <- numeric(20)
for (k in 2:20)
asw[k] <- pam(A, k) $ silinfo $ avg.width
k.best <- which.max(asw)
cat("silhouette-optimal number of clusters:", k.best, "\n")
plot(1:20, asw, type= "h", main = "pam() clustering
assessment",
xlab= "k (# clusters)", ylab = "average silhouette
width")
axis(1, k.best, paste("best",k.best,sep="\n"), col =
"red", col.axis ="red")
(2) Another thing regarding pre-processing data. I have mixed data( Nominal,
numeric categorical etc). Before clustering, i convert all the nominal data
to binary and normlise them.
Is there any elegant way of doing this?
(3) Is there any function to nomlise data in R?
Thank you
[[alternative HTML version deleted]]