I do not understand, from a PCA point of view, the option center=F of prcomp() According to the help page, the calculation in prcomp() "is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix" (as it's done by princomp()) . "This is generally the preferred method for numerical accuracy" The question is that while prcomp(X,center=T,scaling=F) is equivalent to princomp(X,scaling=F), but prcomp(X,center=F) has no equivalent in princomp() Also, the rotation made with either the eigenvectors of prcomp(X,center=T,scaling=F) or the ones of princomp(X,scaling=F) yields PCs with a minimum correlation, as expected for a PCA. But the rotation made with the eigenvectors of prcomp(X,center=F) yields axes that are correlated. Therefore, prcomp(X,center=F) is not really a PCA. See the following example, in which the second column of data matrix X is linearly correlated to the first column: > X <- cbind(rnorm(100,100,50),rnorm(100,100,50)) > X[,2] <- X[,1]*1.5-50 +runif(100,-70,70) > plot(X) > cor(X[,1],X[,2]) [1] 0.903597 > eigvnocent <- prcomp(X,center=F,scaling=F)[[1]] > eigvcent <- prcomp(X,center=T,scaling=F)[[1]] > eigvecnocent <- prcomp(X,center=F,scaling=F)[[2]] > eigveccent <- prcomp(X,center=T,scaling=F)[[2]] > PCnocent <- X%*%eigvecnocent > PCcent <- X%*%eigveccent > par(mfrow=c(2,2)) > plot(X) > plot(PCnocent) > plot(PCcent) > cor(X[,1],X[,2]) [1] 0.903597 > cor(PCcent[,1],PCcent[,2]) [1] -8.778818e-16 > cor(PCnocent[,1],PCnocent[,2]) [1] -0.6908334 > Also the help page of prcomp() states: "Details The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix..." The parenthesis implies some ambiguity, but I do interpret the sentence as indicating that the calculation should always be done using a centered data matrix. Finally, all the examples in the help page use centering (or scaling, which implies centering) Therefore, why the option center=F ? Agus Agus -- Dr. Agustin Lobo Institut de Ciencies de la Terra "Jaume Almera" (CSIC) LLuis Sole Sabaris s/n 08028 Barcelona Spain Tel. 34 934095410 Fax. 34 934110012 email: Agustin.Lobo at ija.csic.es http://www.ija.csic.es/gt/obster
Hi Agus,>> But the rotation made with the eigenvectors of prcomp(X,center=F) yields >> axes that are correlated. Therefore, prcomp(X,center=F) is not really a >> PCA.cor() is not an appropriate test of whether two vectors are orthogonal. The definition that two vectors (in an inner product space) are orthogonal is that their inner product is zero. ## The appropriate test uses ?"%*%" to multiply the matrices: set.seed(123) X <- cbind(rnorm(100,100,50),rnorm(100,100,50)) X[,2] <- X[,1]*1.5-50 +runif(100,-70,70) eigvcent <- prcomp(X,center=T,scaling=F) eigvnocent <- prcomp(X,center=F,scaling=F) eigvcent$rotation[,1] %*% eigvcent$rotation[,2] [1,] -2.211772e-17 eigvnocent$rotation[,1] %*% eigvnocent$rotation[,2] [1,] -1.680513e-17 In both cases, PCs 1 and 2 are orthogonal. Regards, Mark. alobo wrote:> > I do not understand, from a PCA point of view, the option center=F > of prcomp() > > According to the help page, the calculation in prcomp() "is done by a > singular value decomposition of the (centered and possibly scaled) data > matrix, not by using eigen on the covariance matrix" (as it's done by > princomp()) . > "This is generally the preferred method for numerical accuracy" > > The question is that while > prcomp(X,center=T,scaling=F) is equivalent to princomp(X,scaling=F), > but prcomp(X,center=F) has no equivalent in princomp() > > Also, the rotation made with either the eigenvectors of > prcomp(X,center=T,scaling=F) or the ones of princomp(X,scaling=F) > yields PCs with a minimum correlation, as expected > for a PCA. But the rotation made with the eigenvectors of > prcomp(X,center=F) yields axes that are correlated. > Therefore, prcomp(X,center=F) is not really a PCA. > > See the following example, in which the second column of > data matrix X is linearly correlated to the first column: > > > X <- cbind(rnorm(100,100,50),rnorm(100,100,50)) > > X[,2] <- X[,1]*1.5-50 +runif(100,-70,70) > > plot(X) > > cor(X[,1],X[,2]) > [1] 0.903597 > > > eigvnocent <- prcomp(X,center=F,scaling=F)[[1]] > > eigvcent <- prcomp(X,center=T,scaling=F)[[1]] > > eigvecnocent <- prcomp(X,center=F,scaling=F)[[2]] > > eigveccent <- prcomp(X,center=T,scaling=F)[[2]] > > > PCnocent <- X%*%eigvecnocent > > PCcent <- X%*%eigveccent > > par(mfrow=c(2,2)) > > plot(X) > > plot(PCnocent) > > plot(PCcent) > > > cor(X[,1],X[,2]) > [1] 0.903597 > > cor(PCcent[,1],PCcent[,2]) > [1] -8.778818e-16 > > cor(PCnocent[,1],PCnocent[,2]) > [1] -0.6908334 > > > > Also the help page of prcomp() states: > "Details > > The calculation is done by a singular value decomposition of the > (centered and possibly scaled) data matrix..." > > The parenthesis implies some ambiguity, but I do interpret the sentence > as indicating that the calculation should always be done using a > centered data matrix. > Finally, all the examples in the help page use centering (or scaling, > which implies centering) > > Therefore, why the option center=F ? > > Agus > > > > Agus > > -- > Dr. Agustin Lobo > Institut de Ciencies de la Terra "Jaume Almera" (CSIC) > LLuis Sole Sabaris s/n > 08028 Barcelona > Spain > Tel. 34 934095410 > Fax. 34 934110012 > email: Agustin.Lobo at ija.csic.es > http://www.ija.csic.es/gt/obster > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/prcomp%28X%2Ccenter%3DF%29-----tp22396608p22397748.html Sent from the R help mailing list archive at Nabble.com.
Dear Agustin & the Listers, Noncentred PCA is an old and establishes method. It is rarely used, but still (methinks) it is used more often than it should be used. There is nothing wrong in having noncentred PCA in R, and it is a real PCA. Details will follow. On 08/03/2009, at 11:07 AM, Agustin Lobo wrote:> I do not understand, from a PCA point of view, the option center=F > of prcomp() > > According to the help page, the calculation in prcomp() "is done by > a singular value decomposition of the (centered and possibly scaled) > data matrix, not by using eigen on the covariance matrix" (as it's > done by princomp()) . > "This is generally the preferred method for numerical accuracy" > > The question is that while > prcomp(X,center=T,scaling=F) is equivalent to princomp(X,scaling=F), > but prcomp(X,center=F) has no equivalent in princomp()princomp() does not have explicit noncentred analysis because its algorithm is based on the eigen analysis of covariance matrix using funciton cov.wt(). While function cov.wt() knows argument 'center' (which is the US spelling of centre), princomp() does not pass that argument to cov.wt(). However, if you supply a noncentred 'covmat' instead of 'x' (see ?princomp), princomp() will give you noncentred PCA. We can only guess why the authors of princomp() did not allow noncentred analysis. However, I know why the author of rda() function in vegan did not explicitly allow noncentred analysis: he thought that it is a Patently Bad Idea(TM). Further, he suggests in vegan function that if you think you need noncentred analysis, you are able to edit vegan::rda.default() to do so -- if you cannot edit the function, then you probably are wrong in assuming you need noncentred analysis. It may be that the princomp authors think in the same way (but they also allow noncentred analyis iif you supply a noncentred 'covmat'). One of the sins of my youth was to publish a paper using noncentred PCA. I'm penitent. (However, I commited bigger sins, too.)> Also, the rotation made with either the eigenvectors of > prcomp(X,center=T,scaling=F) or the ones of princomp(X,scaling=F) > yields PCs with a minimum correlation, as expected > for a PCA. But the rotation made with the eigenvectors of > prcomp(X,center=F) yields axes that are correlated. > Therefore, prcomp(X,center=F) is not really a PCA.PCA axes are *orthogonal*, but not necessarily uncorrelated (people often mistake these as synonyms). Centred orthogonal axes also are uncorrelated, but noncentred orthogonal are (or may be) correlated. Your example code may be simplified a bit: prcomp returns the rotated matrix so that you do not need the rotation %*% data multiplication after analysis: you get that in the analysis. Below I use multivariate normal random numbers (MASS::mvrnorm) to generate correlated observations: library(MASS) x<- mvrnorm(100, mu = c(1, 3), Sigma=matrix(c(1, 0.6, 0.6, 1), nrow=2)) pc <- prcomp(x, cent=F) Here the scores are orthogonal: crossprod(pc$x) or the off-diagonal elements are numerically zero, but they are correlated: cor(pc$x) The only requirement we have is orthogonality, and uncorrelatedness is a collateral damage in centred analysis. Cheers, Jari Oksanen> > See the following example, in which the second column of > data matrix X is linearly correlated to the first column: > > > X <- cbind(rnorm(100,100,50),rnorm(100,100,50)) > > X[,2] <- X[,1]*1.5-50 +runif(100,-70,70) > > plot(X) > > cor(X[,1],X[,2]) > [1] 0.903597 > > > eigvnocent <- prcomp(X,center=F,scaling=F)[[1]] > > eigvcent <- prcomp(X,center=T,scaling=F)[[1]] > > eigvecnocent <- prcomp(X,center=F,scaling=F)[[2]] > > eigveccent <- prcomp(X,center=T,scaling=F)[[2]] > > > PCnocent <- X%*%eigvecnocent > > PCcent <- X%*%eigveccent > > par(mfrow=c(2,2)) > > plot(X) > > plot(PCnocent) > > plot(PCcent) > > > cor(X[,1],X[,2]) > [1] 0.903597 > > cor(PCcent[,1],PCcent[,2]) > [1] -8.778818e-16 > > cor(PCnocent[,1],PCnocent[,2]) > [1] -0.6908334 > > > > Also the help page of prcomp() states: > "Details > > The calculation is done by a singular value decomposition of the > (centered and possibly scaled) data matrix..." > > The parenthesis implies some ambiguity, but I do interpret the > sentence as indicating that the calculation should always be done > using a centered data matrix. > Finally, all the examples in the help page use centering (or > scaling, which implies centering) > > Therefore, why the option center=F ?There really is a small conflict between docs and function: the function allows noncentred analysis, and it also performs such an analysis if asked. This is documented for the argument "center", but the option is later ignored in the passage you cite. Probably because the authors of the function think that this offered option should not be used. You may submit a report of this internal conflict in prcomp() documents to R maintainers. Cheers, Jari Oksanen