Hi, I was trying to calculate the CCC metric in R with the help of "NbClust" source code and the available SAS manual for CCC. All the Pseudo-F and R-square are matching exactly with the SAS output except for the E_R2 and hence CCC. I have tried and tested in multiple ways but couldn?t get any explanation for this. I have attached sample data, initial seed and also the SAS cluster output in Rdata format which I used for E_R2 and CCC calculation. FYI, following are the values of E_R2 in SAS and R respectively: E_R2=0.4630339 (R); but ERSQ=0.3732597284 (SAS) Could you please help me out with finding what's going wrong in the background? ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Kindly find below the codes I have used for this: ---------- <SAS> ---------- proc fastclus data=sample_data maxiter=100 seed=initial_seed maxc=5 outstat=metrics out=output; Var v1 v2 v3 v4 v5 v6; run; ---------- <R> ---------- load("C:\\Users\\sagnik\\Desktop\\SAS_cluster.Rdata") clust.perf.metrics <- function(data, cl) { data1 <- as.matrix(data) numberObsBefore <- dim(data1)[1] data <- na.omit(data1) nn <- numberObsAfter <- dim(data)[1] pp <- dim(data)[2] qq <- max(cl) TT <- t(data) %*% data sizeEigenTT <- length(eigen(TT)$value) eigenValues <- eigen(TT/(nn - 1))$value for (i in 1:sizeEigenTT) { if (eigenValues[i] < 0) { cat(paste("There are only", numberObsAfter, "non-missing observations out of a possible", numberObsBefore, "observations.")) stop("The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.") } } s1 <- sqrt(eigenValues) ss <- rep(1, sizeEigenTT) for (i in 1:sizeEigenTT) { if (s1[i] != 0) ss[i] = s1[i] } vv <- prod(ss) z <- matrix(0, ncol = qq, nrow = nn) clX <- as.matrix(cl) for (i in 1:nn) for (j in 1:qq) { z[i, j] == 0 if (clX[i, 1] == j) z[i, j] = 1 } xbar <- solve(t(z) %*% z) %*% t(z) %*% data B <- t(xbar) %*% t(z) %*% z %*% xbar W <- TT - B R2 <- 1 - (sum(diag(W))/sum(diag(TT))) PseudoF <- (sum(diag(B))/(qq-1))/(sum(diag(W))/(nn-qq)) v1 <- 1 u1 <- rep(0, pp) c1 <- (vv/qq)^(1/pp) u1 <- ss/c1 k1 <- sum((u1 >= 1) == TRUE) p1 <- min(k1, qq - 1) if (all(p1 > 0, p1 < pp)) { for (i in 1:p1) { v1 <- v1 * ss[i]} c <- (v1/qq)^(1/p1) u <- ss/c b1 <- sum(1/(nn + u[1:p1])) b2 <- sum(u[(p1 + 1):pp]^2/(nn + u[(p1 + 1):pp]), na.rm = TRUE) E_R2 <- 1 - ((b1 + b2)/sum(u^2)) * ((nn - qq)^2/nn) * (1 + (4/nn)) ccc <- log((1 - E_R2)/(1 - R2)) * (sqrt(nn * p1/2)/((0.001 + E_R2)^1.2)) } else { b1 <- sum(1/(nn + u)) E_R2 <- 1 - (b1/sum(u^2)) * ((nn - qq)^2/nn) * (1 + 4/nn) ccc <- log((1 - E_R2)/(1 - R2)) * (sqrt(nn * pp/2)/((0.001 + E_R2)^1.2)) } results <- list(R_2=R2, PseudoF=PseudoF, CCC = ccc, E_R2=E_R2); return(results) } clust.perf.metrics(output[,1:6],output[,7]) #----------------------------------------------------------------------------------------------------------------------------------- THANKS IN ADVANCE, REGARDS, SAGNIK