Hey, All In principal component analysis (PCA), we want to know how many percentage the first principal component explain the total variances among the data. Assume the data matrix X is zero-meaned, and I used the following procedures: C = covriance(X) %% calculate the covariance matrix; [EVector,EValues]=eig(C) %% L = diag(EValues) %%L is a column vector with eigenvalues as the elements percent = L(1)/sum(L); Others argue using Sigular Value Decomposition(SVD) to calculate the same quantity, as: [U,S,V]=svd(X); L = diag(S); L = L.^2; percent = L(1)/sum(L); So which way is the correct method to calculate the percentage explained by the first principal component? Thanks for your advices on this. Fred
If I'm not mistaken, for positive semi-definite matrices, the eigenvalues are equal to squared singular values, so you should get the same answer either way. The code you shown is definitely not R (looks like Matlab), so why are you posting to R-help? Andy> -----Original Message----- > From: Feng Zhang [mailto:f0z6305 at labs.tamu.edu] > Sent: Thursday, February 06, 2003 1:03 PM > To: R-Help > Subject: [R] Confused by SVD and Eigenvector Decomposition in PCA > > > Hey, All > > In principal component analysis (PCA), we want to know how > many percentage > the first principal component explain the total variances > among the data. > > Assume the data matrix X is zero-meaned, and > I used the following procedures: > C = covriance(X) %% calculate the covariance matrix; > [EVector,EValues]=eig(C) %% > L = diag(EValues) %%L is a column vector with eigenvalues as > the elements > percent = L(1)/sum(L); > > > Others argue using Sigular Value Decomposition(SVD) to > calculate the same quantity, as: > [U,S,V]=svd(X); > L = diag(S); > L = L.^2; > percent = L(1)/sum(L); > > > So which way is the correct method to calculate the > percentage explained by > the first principal component? > > Thanks for your advices on this. > > Fred > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help >------------------------------------------------------------------------------
antonio rodriguez
2003-Feb-07 09:32 UTC
[R] Confused by SVD and Eigenvector Decomposition in PCA
Hi Feng, AFIK SVD analysis provides a one-step method for computing all the components of the eigen value problem, without the need to compute and store big covariance matrices. And also the resulting decomposition is computationally more stable and robust. Cheers, Antonio Rodriguez ----- Original Message ----- From: "Feng Zhang" <f0z6305 at labs.tamu.edu> To: "R-Help" <r-help at stat.math.ethz.ch> Sent: Thursday, February 06, 2003 7:03 PM Subject: [R] Confused by SVD and Eigenvector Decomposition in PCA> Hey, All > > In principal component analysis (PCA), we want to know how manypercentage> the first principal component explain the total variances among thedata.> > Assume the data matrix X is zero-meaned, and > I used the following procedures: > C = covriance(X) %% calculate the covariance matrix; > [EVector,EValues]=eig(C) %% > L = diag(EValues) %%L is a column vector with eigenvalues as theelements> percent = L(1)/sum(L); > > > Others argue using Sigular Value Decomposition(SVD) to > calculate the same quantity, as: > [U,S,V]=svd(X); > L = diag(S); > L = L.^2; > percent = L(1)/sum(L); > > > So which way is the correct method to calculate the percentageexplained by> the first principal component? > > Thanks for your advices on this. > > Fred > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help---
Thanks for those replies. But I tested several cases, and found the two percentage from SVD and EVD are not the same. So how to explain the difference and which one should be the right one for use in PCA? ----- Original Message ----- From: "antonio rodriguez" <arv at ono.com> To: "Feng Zhang" <f0z6305 at labs.tamu.edu>; "R-Help" <r-help at stat.math.ethz.ch> Sent: Friday, February 07, 2003 2:36 AM Subject: Re: [R] Confused by SVD and Eigenvector Decomposition in PCA> Hi Feng, > > AFIK SVD analysis provides a one-step method for computing all the > components of the eigen value problem, without the need to compute and > store big covariance matrices. And also the resulting decomposition is > computationally more stable and robust. > > Cheers, > > Antonio Rodriguez > > > ----- Original Message ----- > From: "Feng Zhang" <f0z6305 at labs.tamu.edu> > To: "R-Help" <r-help at stat.math.ethz.ch> > Sent: Thursday, February 06, 2003 7:03 PM > Subject: [R] Confused by SVD and Eigenvector Decomposition in PCA > > > > Hey, All > > > > In principal component analysis (PCA), we want to know how many > percentage > > the first principal component explain the total variances among the > data. > > > > Assume the data matrix X is zero-meaned, and > > I used the following procedures: > > C = covriance(X) %% calculate the covariance matrix; > > [EVector,EValues]=eig(C) %% > > L = diag(EValues) %%L is a column vector with eigenvalues as the > elements > > percent = L(1)/sum(L); > > > > > > Others argue using Sigular Value Decomposition(SVD) to > > calculate the same quantity, as: > > [U,S,V]=svd(X); > > L = diag(S); > > L = L.^2; > > percent = L(1)/sum(L); > > > > > > So which way is the correct method to calculate the percentage > explained by > > the first principal component? > > > > Thanks for your advices on this. > > > > Fred > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > http://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > --- > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help
I used Matlab to do this case study.>x = randn(200,3); %%generating a 200x3 Gaussian matrix >[a,b,c]=svd(x); %%SVD composition >S=diag(b)S =[15.6765 14.8674 13.4016]'>S(1)^2/sum(S.^2);0.3802>ZeroedX = X - repmat(mean(X),200,1); %%ZeroedX is now zero centered data >C = cov(ZeroedX); %%Covariance matrix of ZeroedX >[U,L] = eig(C); %% Eigen decompostion of C > SE = diag(L);[0.8918 1.1098 1.2337]'>SE(1)/sum(SE)0.3813 This is the case that I was confused by. Fred ----- Original Message ----- From: "Liaw, Andy" <andy_liaw at merck.com> To: "'Feng Zhang'" <f0z6305 at labs.tamu.edu> Sent: Friday, February 07, 2003 6:25 PM Subject: RE: [R] Confused by SVD and Eigenvector Decomposition in PCA> I've already shown you one example. If that's not enough, here's another > one: > > > set.seed(1) > > x <- matrix(runif(1e3), 50, 20) > > La.eigen(crossprod(x))$value > [1] 258.5242317 9.3638224 8.7213839 7.7425270 6.50571906.2719056> [7] 5.6582657 4.5002047 4.2289555 3.9098726 3.71726423.2826449> [13] 2.8758329 2.6907474 2.3300505 1.9700120 1.31915121.0228788> [19] 0.8883083 0.5883287 > > La.svd(x)$d^2 > [1] 258.5242317 9.3638224 8.7213839 7.7425270 6.50571906.2719056> [7] 5.6582657 4.5002047 4.2289555 3.9098726 3.71726423.2826449> [13] 2.8758329 2.6907474 2.3300505 1.9700120 1.31915121.0228788> [19] 0.8883083 0.5883287 > > Where's your example of this not working? > > Andy > > > > -----Original Message----- > > From: Feng Zhang [mailto:f0z6305 at labs.tamu.edu] > > Sent: Friday, February 07, 2003 12:07 PM > > To: antonio rodriguez; R-Help > > Subject: Re: [R] Confused by SVD and Eigenvector Decomposition in PCA > > > > > > Thanks for those replies. > > > > But I tested several cases, and found the two > > percentage from SVD and EVD are not > > the same. > > So how to explain the difference and which > > one should be the right one for use > > in PCA? > > > > > > ----- Original Message ----- > > From: "antonio rodriguez" <arv at ono.com> > > To: "Feng Zhang" <f0z6305 at labs.tamu.edu>; "R-Help" > > <r-help at stat.math.ethz.ch> > > Sent: Friday, February 07, 2003 2:36 AM > > Subject: Re: [R] Confused by SVD and Eigenvector Decomposition in PCA > > > > > > > Hi Feng, > > > > > > AFIK SVD analysis provides a one-step method for computing all the > > > components of the eigen value problem, without the need to > > compute and > > > store big covariance matrices. And also the resulting > > decomposition is > > > computationally more stable and robust. > > > > > > Cheers, > > > > > > Antonio Rodriguez > > > > > > > > > ----- Original Message ----- > > > From: "Feng Zhang" <f0z6305 at labs.tamu.edu> > > > To: "R-Help" <r-help at stat.math.ethz.ch> > > > Sent: Thursday, February 06, 2003 7:03 PM > > > Subject: [R] Confused by SVD and Eigenvector Decomposition in PCA > > > > > > > > > > Hey, All > > > > > > > > In principal component analysis (PCA), we want to know how many > > > percentage > > > > the first principal component explain the total variances > > among the > > > data. > > > > > > > > Assume the data matrix X is zero-meaned, and > > > > I used the following procedures: > > > > C = covriance(X) %% calculate the covariance matrix; > > > > [EVector,EValues]=eig(C) %% > > > > L = diag(EValues) %%L is a column vector with eigenvalues as the > > > elements > > > > percent = L(1)/sum(L); > > > > > > > > > > > > Others argue using Sigular Value Decomposition(SVD) to > > > > calculate the same quantity, as: > > > > [U,S,V]=svd(X); > > > > L = diag(S); > > > > L = L.^2; > > > > percent = L(1)/sum(L); > > > > > > > > > > > > So which way is the correct method to calculate the percentage > > > explained by > > > > the first principal component? > > > > > > > > Thanks for your advices on this. > > > > > > > > Fred > > > > > > > > ______________________________________________ > > > > R-help at stat.math.ethz.ch mailing list > > > > http://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > > > > --- > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > http://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > http://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > ------------------------------------------------------------------------------> Notice: This e-mail message, together with any attachments, containsinformation of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.> >============================================================================>
Stephane Dray
2003-Feb-08 10:28 UTC
[R] Confused by SVD and Eigenvector Decomposition in PCA
At 21:16 07/02/2003 -0600, Feng Zhang wrote:>I used Matlab to do this case study. > >x = randn(200,3); %%generating a 200x3 Gaussian matrix > >[a,b,c]=svd(x); %%SVD composition > >S=diag(b) > S =[15.6765 14.8674 13.4016]' > > >S(1)^2/sum(S.^2); > 0.3802> >ZeroedX = X - repmat(mean(X),200,1); %%ZeroedX is now zero centered data > >C = cov(ZeroedX); %%Covariance matrix of ZeroedX > >[U,L] = eig(C); %% Eigen decompostion of C > > SE = diag(L); > [0.8918 1.1098 1.2337]' > >SE(1)/sum(SE) > 0.3813 > >This is the case that I was confused by. > >FredYou must also apply svd on your centred table X (i.e. ZeroeX)
There *is* a Matlab newsgroup for you to ask Matlab questions. From the latest Matlab digest: MATLAB Usenet Group Celebrates Its 10th Anniversary The MATLAB Usenet group, comp.soft-sys.matlab (CSSM), celebrated its 10th anniversary this month. CSSM is a collaboration space where thousands of MATLAB users discuss MATLAB-related topics or post questions to the community. In 2002, CSSM featured more than 33,800 posts. Use our online newsreader to communicate with the MATLAB community at: www.mathworks.com/matlabcentral Andy ------------------------------------------------------------------------------