Hello there. I wanted to segment customers and is trying both kmeans and HCA method in R. As I don't know how many cluster is good for a set of data, in kmeans, I tried the code below and get a nice plot on number of cluster vs SSE. Looking at the slope between cluster number, I have a guide on how many cluster should there be. dis <- as.matrix(daisy(x, metric = "gower", stand=TRUE)) #x is my raw data result <- matrix(NA, 9, 12) for (i in 1:9) { j <- i+1 modelkmeans = kmeans(dis, j) result[i,1] <- j result[i,2] <- mean(modelkmeans$withinss) for (k in 1:j) result[i,k+2] <- modelkmeans$size[k] } plot(result, main = "Internal Index of K-means Clustering", sub = "", xlab = "Number of clusters", ylab = "Sum of Squared Error (SSE)", col = "blue") Question: 1. Am I doing the right thing in kmeans?, as I am a novice in stats. 2. Do I do the same thing for HCA but instead of SSE, I do a plot of number of clusters vs AC? 3. After obtaining the 2 results, 1 from kmeans and 1 from HCA, is there a way that I can compare which set of results is 'better'? 4. Is there any other methods on how to recommend number of clusters? Many thanks. siangli