Ashish Ranpura
2005-Dec-07 23:08 UTC
[R] KMO sampling adequacy and SPSS -- partial solution
Dear colleagues, I've been searching for information on the Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy (MSA). This statistic is generated in SPSS and is often used to determine if a dataset is "appropriate" for factor analysis -- it's true utility seems quite low, but it seems to come up in stats classes a lot. It did in mine, and a glance through the R-help archives suggests I'm not alone. I finally found a reference describing the calculation, and wrote the following R function to perform it. Note that the function depends on a partial correlation function found in library(corpcor). kmo.test <- function(df){ ### ## Calculate the Kaiser-Meyer-Olkin Measure of Sampling Adequacy. ## Input should be a data frame or matrix, output is the KMO statistic. ## Formula derived from Hutcheson et al, 1999, ## "The multivariate social scientist," page 224, ISBN 0761952012 ## see <http://www2.chass.ncsu.edu/garson/pa765/hutcheson.htm> ### cor.sq = cor(df)^2 cor.sumsq = (sum(cor.sq)-dim(cor.sq)[1])/2 library(corpcor) pcor.sq = cor2pcor(cor(df))^2 pcor.sumsq = (sum(pcor.sq)-dim(pcor.sq)[1])/2 kmo = sus.cor.ss/(sus.cor.ss+sus.pcor.ss) return(kmo) } Also, for those trying to reproduce the SPSS factor analysis output, (-1 * cor2pcor(cor(yourDataFrame))) will produce the "anti-image correlation" matrix. Unfortunately, the most useful property of that matrix in SPSS is that the diagonals represent the individual MSA values -- I haven't found a way to derive those yet. Still working on that, any suggestions appreciated. --Ash. ----- Ashish Ranpura Institute of Cognitive Neuroscience University College London 17 Queen Square London WC1N 3AR tel: +44 (20) 7679 1126 web: http://www.icn.ucl.ac.uk
Ashish Ranpura
2005-Dec-07 23:19 UTC
[R] KMO sampling adequacy and SPSS -- partial solution
Sorry, there was an error in that function, a hangover from a previous session. The corrected function is: kmo.test <- function(df){ ### ## Calculate the Kaiser-Meyer-Olkin Measure of Sampling Adequacy. ## Input should be a data frame or matrix, output is the KMO statistic. ## Formula derived from Hutcheson et al, 1999, ## "The multivariate social scientist," page 224, ISBN 0761952012 ## see <http://www2.chass.ncsu.edu/garson/pa765/hutcheson.htm> ### cor.sq = cor(df)^2 cor.sumsq = (sum(cor.sq)-dim(cor.sq)[1])/2 library(corpcor) pcor.sq = cor2pcor(cor(df))^2 pcor.sumsq = (sum(pcor.sq)-dim(pcor.sq)[1])/2 kmo = cor.sumsq/(cor.sumsq+pcor.sumsq) return(kmo) } --Ashish.