Wolfgang Lindner
2003-Jan-03 07:57 UTC
[R] factor analysis (pca): how to get the 'communalities'?
Dear expe-R-ts, I try some test data for a factorAnalysis (resp. pca) in the sense of Prof. Ripley's MASS ? 11.1, p. 330 ff., just to prepare myself for an analysis of my own empirical data using R (instead of SPSS). 1. the data. ## The test data is (from the book of Backhaus et al.: Multivariate ## Analysemethoden. Springer 2000 [9th ed.], p. 300 ff): a<-c(4.5,5.167,5.059,3.8,3.444,3.5,5.25,5.857,5.083,5.273,4.5) b<-c(4.0,4.25,3.824,5.4,5.056,3.5,3.417,4.429,4.083,3.6,4.0) c<-c(4.375,3.833,4.765,3.8,3.778,3.875,4.583,4.929,4.667,3.909,4.2) d<-c(3.875,3.833,3.438,2.4,3.765,4.0,3.917,3.857,4.0,4.091,3.9) e<-c(3.25,2.167,4.235,5.0,3.944,4.625,4.333,4.071,4.0,4.091,3.7) f<-c(3.75,3.75,4.471,5.0,5.389,5.250,4.417,5.071,4.25,4.091,3.9) g<-c(4.0,3.273,3.765,5.0,5.056,5.5,4.667,2.929,3.818,4.545,3.6) h<-c(2.0,1.857,1.923,4.0,5.615,6.0,3.25,2.091,1.545,1.6,1.5) i<-c(4.625,3.75,3.529,4.0,4.222,4.75,4.5,4.571,3.75,3.909,3.5) j<-c(4.125,3.417,3.529,4.6,5.278,5.375,3.583,3.786,4.167,3.818,3.7) m<-data.frame(a,b,c,d,e,f,g,h,i,j) 2. My try of a pca with R. ## My R input was: m cor(m) library(mva) m.pca<-princomp(m,cor=T) m.pca summary(m.pca) loadings(m.pca) m.pca$scores m.FA <- factanal(factors = 3, covmat=cov(m)) m.FA 3. Here are my questions. Q1. The cor(m)-Matrix is the same as reported by using SPSS (or OpenStats2). But in R I get other eigenvalues compared with the following SPSS output: Original matrix trace = 10,00 Roots (Eigenvalues) Extracted: 1 5,052 2 1,771 3 1,427 4 0,819 5 0,430 6 0,247 7 0,159 8 0,062 9 0,029 10 0,003 - What is going behind the scene? - Or what I am doing wrong in my use of R? - If I am doing the pca correct, can I use the R results as equally aceptable without further discussion? Maybe a different 'hidden' algorithm is the reason for different results? Q2. How to get the so called 'Communality Estimates' with R? Here the values reported by SPSS for the above test data.frame m: Communality Estimates as percentages: 1 88,619 2 76,855 3 89,167 4 85,324 5 76,043 6 84,012 7 80,223 8 92,668 9 63,297 10 88,786 Any help, suggestions or hints are very welcome. Best regards and happy new year for you and R Wolfgang -- Wolfgang Lindner Lindner at math.uni-duisburg.de Gerhard-Mercator-Universitaet Duisburg Tel: +49 0203 379-1326 Fakultaet 4 - Naturwissenschaften Fax: +49 0203 379-2528 Institut fuer Mathematik, LE 424 Lotharstr. 65 D 47048 Duisburg (Germany)
ripley@stats.ox.ac.uk
2003-Jan-03 09:14 UTC
[R] factor analysis (pca): how to get the 'communalities'?
On Fri, 3 Jan 2003, Wolfgang Lindner wrote:> I try some test data for a factorAnalysis (resp. pca) in the sense of Prof.Well, factor analysis and pca are different things, and only one is appropriate in a given problem.> Ripley's MASS ยง 11.1, p. 330 ff.,Eh? Would that be *Venables & Ripley's* MASS, and if so which edition (it is not the current one). Those editions which cover factor analysis do explain the difference.>just to prepare myself for an analysis of my > own empirical data using R (instead of SPSS). > > 1. the data. > > ## The test data is (from the book of Backhaus et al.: Multivariate ## > Analysemethoden. Springer 2000 [9th ed.], p. 300 ff): > > a<-c(4.5,5.167,5.059,3.8,3.444,3.5,5.25,5.857,5.083,5.273,4.5) > b<-c(4.0,4.25,3.824,5.4,5.056,3.5,3.417,4.429,4.083,3.6,4.0) > c<-c(4.375,3.833,4.765,3.8,3.778,3.875,4.583,4.929,4.667,3.909,4.2) > d<-c(3.875,3.833,3.438,2.4,3.765,4.0,3.917,3.857,4.0,4.091,3.9) > e<-c(3.25,2.167,4.235,5.0,3.944,4.625,4.333,4.071,4.0,4.091,3.7) > f<-c(3.75,3.75,4.471,5.0,5.389,5.250,4.417,5.071,4.25,4.091,3.9) > g<-c(4.0,3.273,3.765,5.0,5.056,5.5,4.667,2.929,3.818,4.545,3.6) > h<-c(2.0,1.857,1.923,4.0,5.615,6.0,3.25,2.091,1.545,1.6,1.5) > i<-c(4.625,3.75,3.529,4.0,4.222,4.75,4.5,4.571,3.75,3.909,3.5) > j<-c(4.125,3.417,3.529,4.6,5.278,5.375,3.583,3.786,4.167,3.818,3.7) > > m<-data.frame(a,b,c,d,e,f,g,h,i,j) > > 2. My try of a pca with R. > > ## My R input was: > > m > cor(m) > library(mva) > m.pca<-princomp(m,cor=T) > m.pca > summary(m.pca) > loadings(m.pca) > m.pca$scores > m.FA <- factanal(factors = 3, covmat=cov(m)) > m.FA > > 3. Here are my questions. > > Q1. > The cor(m)-Matrix is the same as reported by using SPSS (or OpenStats2). > But in R I get other eigenvalues compared with the following SPSS output:You don't get eigenvalues at all in R. You do get `Proportion of Variance' which are these numbers divided by their total.> Original matrix trace = 10,00 > Roots (Eigenvalues) Extracted: > 1 5,052 > 2 1,771 > 3 1,427 > 4 0,819 > 5 0,430 > 6 0,247 > 7 0,159 > 8 0,062 > 9 0,029 > 10 0,003 > > - What is going behind the scene?Why don't you ask the SPSS people that? R at least gives you sensible labels on the output.> - Or what I am doing wrong in my use of R? > - If I am doing the pca correct, can I use the R results as equally aceptable > without further discussion?No, as more acceptable: at least they have meaningful labels.> Maybe a different 'hidden' algorithm is the reason for different results?Ask SPSS that. R's code is open, and nothing is hidden. You have not demonstrated that the results are different, anyway!> Q2. How to get the so called 'Communality Estimates' with R?First, use the data as in> (m.FA <- factanal(m, factors=3))and where did the number of factors come from? 100*(1 - m.FA$uniquenesses) gives the communalities. They are different from SPSS, because (1) R uses maximum likelihood FA and (2) tries a lot harder to find a maximum and there are many local maxima in most FA problems. In this case you have fitted too many factors, and just one suffices.> Here the values reported by SPSS for the above test data.frame m: > Communality Estimates as percentages: > 1 88,619 > 2 76,855 > 3 89,167 > 4 85,324 > 5 76,043 > 6 84,012 > 7 80,223 > 8 92,668 > 9 63,297 > 10 88,786 > > Any help, suggestions or hints are very welcome.1) Be a lot more accurate. 2) Read the help pages to find out what the output means. In the case of R the information is there, but you may well have to post on an SPSS help list to find out why SPSS gives different output from R. 3) Don't believe SPSS knows what it is doing. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Wolfgang Lindner
2003-Jan-03 09:53 UTC
[R] factor analysis (pca): how to get the 'communalities'?
Dear Prof. Ripley, many thanks for your prompt answer and the valuable hints and pointers, I will study them and try it again. 1. Sorry, for not giving the full quote; my book is: Venables & Ripley: MASS. Springer 1999, 3rd Ed. (Corr. 3rd printing 2001); ? 11.1, p. 330 ff. (?11 written by B.D.R.)> Those editions which cover factor analysis do explain the difference.So I will look for the latest edition .. 2.> 3) Don't believe SPSS knows what it is doing.o.k. I see ;-) Best regards Wolfgang -- Wolfgang Lindner Lindner at math.uni-duisburg.de Gerhard-Mercator-Universitaet Duisburg Tel: +49 0203 379-1326 Fakultaet 4 - Naturwissenschaften Fax: +49 0203 379-2528 Institut fuer Mathematik, LE 424 Lotharstr. 65 D 47048 Duisburg (Germany)
Wolfgang Lindner
2003-Jan-03 21:05 UTC
[R] factor analysis (pca): how to get the 'communalities'?
Scot, thank you very much for your wonderful clear and short fix of my first problem: seeing your solution as one-liner in the impressive insightful syntax of R is really an aesthetic experience for me: | I ran your example and found that you can get the eigenvalues SPSS by [..] | m.pca$sdev^2 | So squaring the standard deviations (sdev) of the components gives you the | eigenvalues SPSS reports. I am a little sorrow of not having seen it for myself ;-) - but I think that's live in becoming a friend of R and making the first steps with pca, fa, ca & co. R is indeed a first choice tool in doing understandable statistics and Prof Ripley's indication to R's open code points definitive in the same direction for me. Now the two worlds become reconciled and the fog gets thinner for me. Thank you both. Wolfgang -- Wolfgang Lindner Lindner at math.uni-duisburg.de Gerhard-Mercator-Universitaet Duisburg Tel: +49 0203 379-1326
Brett Magill
2003-Jan-03 21:53 UTC
[R] factor analysis (pca): how to get the 'communalities'?
If interested, on my web site I have code to do factor analysis by PC. Does exactly as below, but a nice wrapper to print methods, rotations, sorting, and other conveniences. home.earthlink.net/~bmagill/MyMisc.html The relevant code snipets are "prinfact", "plot.pfa", and "print.pfa", along with the other required functions as indiciated on the web site. On Fri, 3 Jan 2003 21:04:21 +0100 Wolfgang Lindner <LindnerW at t-online.de> wrote:> Scot, > > thank you very much for your wonderful clear > and short fix of my first problem: > seeing your solution as one-liner in the > impressive insightful syntax of R is > really an aesthetic experience for me: > > | I ran your example and found that you can > get the eigenvalues SPSS by [..] > | m.pca$sdev^2 > | So squaring the standard deviations (sdev) > of the components gives you the > | eigenvalues SPSS reports. > > I am a little sorrow of not having seen it for > myself ;-) - but I think that's > live in becoming a friend of R and making the > first steps with pca, fa, ca & co. > R is indeed a first choice tool in doing > understandable statistics and Prof > Ripley's indication to R's open code points > definitive in the same direction for > me. Now the two worlds become reconciled and > the fog gets thinner for me. > Thank you both. > > Wolfgang > -- > Wolfgang Lindner > Lindner at math.uni-duisburg.de > Gerhard-Mercator-Universitaet Duisburg Tel: > +49 0203 379-1326 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help >