Coghlan, Avril
2011-Jun-30 19:48 UTC
[R] sdev value returned by princomp function (used for PCA)
Dear all, I have a question about the 'sdev' value returned by the princomp function (which does principal components analysis). On the help page for princomp it says 'sdev' is 'the standard deviations of the principal components'. However, when I calculate the principal components for the USArrests data set, I don't find this to be the case: Here is how I calculated the principal components and got the 'sdev' values:> scaledUSArrests <- scale(USArrests) # standardise the variables to have variance 1 and mean 0 > myPCA <- princomp(scaledUSArrests) # do the PCA > myPCA$sdevComp.1 Comp.2 Comp.3 Comp.4 1.5590500 0.9848705 0.5911277 0.4122639 As far as I understand, the principal components themselves are stored in the 'scores' value returned by princomp, so I calculated the standard deviations of those values to see if they agree with 'sdev':> sd(myPCA$scores)Comp.1 Comp.2 Comp.3 Comp.4 1.5748783 0.9948694 0.5971291 0.4164494 I get different values than I got using sd(myPCA$scores) and myPCA$sdev, why is this? As there are 4 standardised variables, and the variance of each standardised variable is 1, I would expect the variances of the principal components to add to 1. I find that this is true for the variances calculated using sd(myPCA$scores) but not myPCA$sdev:> (1.5590500^2) + (0.9848705^2) + (0.5911277^2) + (0.4122639^2)3.92> (1.5748783^2) + (0.9948694^2) + (0.5971291^2) + (0.4164494^2)4.00 I am wondering why are the values in 'sdev' not equal to the standard deviations of the principal components (as stored in 'scores'), and why is the total variance calculated by summing the squared 'sdev' values not equal to 4? Sorry if I have misunderstood something obvious. I would be grateful for help as I'm a bit confused.. Kind regards, Avril Avril Coghlan Cork, Ireland [[alternative HTML version deleted]]
Sarah Goslee
2011-Jun-30 20:10 UTC
[R] sdev value returned by princomp function (used for PCA)
Hi, What you want is this:> myPCA.cor <- princomp(scaledUSArrests, cor=TRUE) > myPCA.cor$sdevComp.1 Comp.2 Comp.3 Comp.4 1.5748783 0.9948694 0.5971291 0.4164494>or alternatively> myPCA2 <- prcomp(scaledUSArrests) > myPCA2$sdev[1] 1.5748783 0.9948694 0.5971291 0.4164494 See this message for more detail: https://stat.ethz.ch/pipermail/r-help/2002-October/025556.html Sarah On Thu, Jun 30, 2011 at 3:48 PM, Coghlan, Avril <A.Coghlan at ucc.ie> wrote:> > Dear all, > > I have a question about the 'sdev' value returned by the princomp function (which does principal components analysis). > > On the help page for princomp it says 'sdev' is 'the standard deviations of the principal components'. > > However, when I calculate the principal components for the USArrests data set, I don't find this to be the case: > > Here is how I calculated the principal components and got the 'sdev' values: >> scaledUSArrests <- scale(USArrests) # standardise the variables to have variance 1 and mean 0 >> myPCA <- princomp(scaledUSArrests) ?# do the PCA >> myPCA$sdev > ? Comp.1 ? ?Comp.2 ? ?Comp.3 ? ?Comp.4 > 1.5590500 0.9848705 0.5911277 0.4122639 > > As far as I understand, the principal components themselves are stored in the 'scores' value returned by princomp, so I calculated the standard deviations of those values to see if they agree with 'sdev': >> sd(myPCA$scores) > ? Comp.1 ? ?Comp.2 ? ?Comp.3 ? ?Comp.4 > 1.5748783 0.9948694 0.5971291 0.4164494 > > I get different values than I got using sd(myPCA$scores) and myPCA$sdev, why is this? > > As there are 4 standardised variables, and the variance of each standardised variable is 1, I would expect the variances of the principal components to add to 1. I find that this is true for the variances calculated using sd(myPCA$scores) but not myPCA$sdev: >> (1.5590500^2) + (0.9848705^2) + (0.5911277^2) + (0.4122639^2) > ?3.92 >> (1.5748783^2) + (0.9948694^2) + (0.5971291^2) + (0.4164494^2) > ?4.00 > > I am wondering why are the values in 'sdev' not equal to the standard deviations of the principal components (as stored in 'scores'), and why is the total variance calculated by summing the squared 'sdev' values not equal to 4? > > Sorry if I have misunderstood something obvious. I would be grateful for help as I'm a bit confused.. > > Kind regards, > Avril > > Avril Coghlan > Cork, Ireland >-- Sarah Goslee http://www.functionaldiversity.org
peter dalgaard
2011-Jun-30 21:02 UTC
[R] sdev value returned by princomp function (used for PCA)
On Jun 30, 2011, at 21:48 , Coghlan, Avril wrote:> > Dear all, > > I have a question about the 'sdev' value returned by the princomp function (which does principal components analysis). > > On the help page for princomp it says 'sdev' is 'the standard deviations of the principal components'. > > However, when I calculate the principal components for the USArrests data set, I don't find this to be the case: > > Here is how I calculated the principal components and got the 'sdev' values: >> scaledUSArrests <- scale(USArrests) # standardise the variables to have variance 1 and mean 0 >> myPCA <- princomp(scaledUSArrests) # do the PCA >> myPCA$sdev > Comp.1 Comp.2 Comp.3 Comp.4 > 1.5590500 0.9848705 0.5911277 0.4122639 > > As far as I understand, the principal components themselves are stored in the 'scores' value returned by princomp, so I calculated the standard deviations of those values to see if they agree with 'sdev': >> sd(myPCA$scores) > Comp.1 Comp.2 Comp.3 Comp.4 > 1.5748783 0.9948694 0.5971291 0.4164494 > > I get different values than I got using sd(myPCA$scores) and myPCA$sdev, why is this? > > As there are 4 standardised variables, and the variance of each standardised variable is 1, I would expect the variances of the principal components to add to 1. I find that this is true for the variances calculated using sd(myPCA$scores) but not myPCA$sdev: >> (1.5590500^2) + (0.9848705^2) + (0.5911277^2) + (0.4122639^2) > 3.92 >> (1.5748783^2) + (0.9948694^2) + (0.5971291^2) + (0.4164494^2) > 4.00 > > I am wondering why are the values in 'sdev' not equal to the standard deviations of the principal components (as stored in 'scores'), and why is the total variance calculated by summing the squared 'sdev' values not equal to 4? > > Sorry if I have misunderstood something obvious. I would be grateful for help as I'm a bit confused..Rather, you overlooked something not-quite obvious: "Note that the default calculation uses divisor N for the covariance matrix." You'll have to ask the authors for why this convention has been used, but it is of course easy to convert to N-1 divisor once you know what is happening (typically you don't really care about scaling anyway). When your data have been scaled with their sd, using N-1 for divisor, the variance using N as divisor will be (1 - 1/N) and true enough: 4 * (1 - 1/50) = 3.92> > Kind regards, > Avril > > Avril Coghlan > Cork, Ireland > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com