Dear forumities, It's seem that there is no straight forward way to calculate R2 of a cluster solution in R. So, I would like to know if I'm right when calculating a R2-like statistic for a given clustering solution. In fact, I have different cluster solution for a given set of data. I would like to know which cluster solution gives the highest R2. My data (5 variables) are scaled to a 0 mean and 1 std. This is the command lines I used to calculate R2 for 1 cluster solution: SSTot <- (nrow(grid40km.datascale)-1)*sum(apply(grid40km.datascale,2,var)) # total sum of square SStot_grid40km <- NULL for (i in 1:22) # there is 22 clusters { data_group <- subset(grid40km.data,grid40km.cluster==i, select=c(X1, X2, X3, X4, X5)) SSgroup <- (nrow(data_group-1)*sum(apply(data_group,2,var))) # SS for all variables for a given cluster SStot_grid40km=append(SStot_grid40km, SSgroup,after=length(SStot_grid40km)) } ssw_grid40km = sum(SStot_grid40km) #withinSS (??) as the sum of SS for all clusters ssbetween_grid40km = SSTot-ssw_grid40km RSQ_grid40km2 = ssbetween_grid40km/SSTot # R-square Am I right? Does this correspond to SAS's R2? Many thanks, Yan Ressources Naturelles Canada Service Canadien des ForĂȘts - Centre de Foresterie des Laurentides 1055, rue du PEPS CP 10380, Succ. Ste-Foy QuĂ©bec, QC, G1V 4C7 Tel. : +001 418 649-6859 Fax : +001 418 648-5849 email : Yan.Boulanger@nrcan.gc.ca [[alternative HTML version deleted]]