Marcelo Kittlein
2015-Sep-14 13:46 UTC
[R] Error in principal component loadings calculation
Hi all I have been using "princomp" to obtain the principal components of some data and find that the loadings returned by the function appear to have some error. in a simple example if a calculate de pc for a random matrix I get that all loadings for the different components have the same proportion of variance data <- matrix(runif(100), 20, 5) pc <- princomp(data, cor=TRUE) loadings(pc) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 [1,] -0.280 0.510 0.674 -0.217 -0.400 [2,] 0.529 -0.353 -0.694 -0.330 [3,] -0.111 0.563 -0.713 -0.336 -0.222 [4,] -0.530 -0.502 -0.178 0.140 -0.645 [5,] -0.590 -0.215 -0.582 0.516 Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 SS loadings 1.0 1.0 1.0 1.0 1.0 Proportion Var 0.2 0.2 0.2 0.2 0.2 Cumulative Var 0.2 0.4 0.6 0.8 1.0 This keep returning the same proportion of variance for each component regardless of the data used. my R version is > R.Version() $platform [1] "x86_64-unknown-linux-gnu" $arch [1] "x86_64" $os [1] "linux-gnu" $system [1] "x86_64, linux-gnu" $status [1] "" $major [1] "3" $minor [1] "2.1" $year [1] "2015" $month [1] "06" $day [1] "18" $`svn rev` [1] "68531" $language [1] "R" $version.string [1] "R version 3.2.1 (2015-06-18)" $nickname [1] "World-Famous Astronaut" some hint would be much appreciated. Best regards Marcelo Kittlein [[alternative HTML version deleted]]
David L Carlson
2015-Sep-14 21:07 UTC
[R] Error in principal component loadings calculation
The sum of the squared loadings will always sum to 1 because they are standardized by dividing them by the standard deviation of each component. The terminology for principal components is not as consistent as we could hope. What princomp() calls loadings is really the structure matrix (the correlation between each variable and the component). The pattern matrix (often called the loadings) are the regression coefficients for computing the principal component scores. You are probably looking for the pattern matrix which is easy to obtain by multiplying by the standard deviations:> set.seed(42) > data <- matrix(runif(100), 20, 5) > pc <- princomp(data, cor=TRUE) > loadings(pc)Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 [1,] 0.638 0.249 -0.260 -0.679 [2,] -0.714 0.449 0.298 -0.444 [3,] 0.585 -0.152 0.522 -0.231 0.555 [4,] -0.617 -0.543 -0.564 [5,] -0.496 0.154 0.479 -0.687 -0.172 Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 SS loadings 1.0 1.0 1.0 1.0 1.0 Proportion Var 0.2 0.2 0.2 0.2 0.2 Cumulative Var 0.2 0.4 0.6 0.8 1.0> rowSums(pc$loadings^2)[1] 1 1 1 1 1> # Notice that the column sums of the squared loadings all equal 0 > # Now multiply each loading by its standard deviation > sweep(pc$loadings, 2, pc$sdev, "*")Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 [1,] 0.765 0.275 -0.237 -0.531 [2,] -0.787 0.427 0.271 -0.347 [3,] 0.701 -0.167 0.497 -0.211 0.434 [4,] -0.680 -0.518 -0.515 [5,] -0.594 0.169 0.456 -0.627 -0.134 Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 SS loadings 1.436 1.215 0.907 0.832 0.611 Proportion Var 0.287 0.243 0.181 0.166 0.122 Cumulative Var 0.287 0.530 0.712 0.878 1.000> pc$sdev^2Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 1.4362072 1.2145055 0.9068555 0.8315685 0.6108632> # Now the sum of the squared loadings equals the > # squared standard deviation (aka the eigenvalues)------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Marcelo Kittlein Sent: Monday, September 14, 2015 8:46 AM To: r-help at r-project.org Subject: [R] Error in principal component loadings calculation Hi all I have been using "princomp" to obtain the principal components of some data and find that the loadings returned by the function appear to have some error. in a simple example if a calculate de pc for a random matrix I get that all loadings for the different components have the same proportion of variance data <- matrix(runif(100), 20, 5) pc <- princomp(data, cor=TRUE) loadings(pc) Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 [1,] -0.280 0.510 0.674 -0.217 -0.400 [2,] 0.529 -0.353 -0.694 -0.330 [3,] -0.111 0.563 -0.713 -0.336 -0.222 [4,] -0.530 -0.502 -0.178 0.140 -0.645 [5,] -0.590 -0.215 -0.582 0.516 Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 SS loadings 1.0 1.0 1.0 1.0 1.0 Proportion Var 0.2 0.2 0.2 0.2 0.2 Cumulative Var 0.2 0.4 0.6 0.8 1.0 This keep returning the same proportion of variance for each component regardless of the data used. my R version is > R.Version() $platform [1] "x86_64-unknown-linux-gnu" $arch [1] "x86_64" $os [1] "linux-gnu" $system [1] "x86_64, linux-gnu" $status [1] "" $major [1] "3" $minor [1] "2.1" $year [1] "2015" $month [1] "06" $day [1] "18" $`svn rev` [1] "68531" $language [1] "R" $version.string [1] "R version 3.2.1 (2015-06-18)" $nickname [1] "World-Famous Astronaut" some hint would be much appreciated. Best regards Marcelo Kittlein [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David L Carlson
2015-Sep-14 22:36 UTC
[R] Error in principal component loadings calculation
The quickest way to get that is> summary(pc)Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Standard deviation 1.1984186 1.1020461 0.9522896 0.9119038 0.7815774 Proportion of Variance 0.2872414 0.2429011 0.1813711 0.1663137 0.1221726 Cumulative Proportion 0.2872414 0.5301425 0.7115137 0.8778274 1.0000000 David -----Original Message----- From: Marcelo Kittlein [mailto:kittlein at mdp.edu.ar] Sent: Monday, September 14, 2015 1:28 PM To: David L Carlson <dcarlson at tamu.edu> Subject: Re: [R] Error in principal component loadings calculation Thanks David I thought that "Proportion var" was the proportion of the variance of successive component scores. The one you get with "summary" of the princomp object. Proportion Var 0.2 0.2 0.2 0.2 0.2 On 14/09/15 21:07, David L Carlson wrote:> The sum of the squared loadings will always sum to 1 because they are standardized by dividing them by the standard deviation of each component. The terminology for principal components is not as consistent as we could hope. What princomp() calls loadings is really the structure matrix (the correlation between each variable and the component). The pattern matrix (often called the loadings) are the regression coefficients for computing the principal component scores. You are probably looking for the pattern matrix which is easy to obtain by multiplying by the standard deviations: > >> set.seed(42) >> data <- matrix(runif(100), 20, 5) >> pc <- princomp(data, cor=TRUE) >> loadings(pc) > Loadings: > Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 > [1,] 0.638 0.249 -0.260 -0.679 > [2,] -0.714 0.449 0.298 -0.444 > [3,] 0.585 -0.152 0.522 -0.231 0.555 > [4,] -0.617 -0.543 -0.564 > [5,] -0.496 0.154 0.479 -0.687 -0.172 > > Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 > SS loadings 1.0 1.0 1.0 1.0 1.0 > Proportion Var 0.2 0.2 0.2 0.2 0.2 > Cumulative Var 0.2 0.4 0.6 0.8 1.0 >> rowSums(pc$loadings^2) > [1] 1 1 1 1 1 >> # Notice that the column sums of the squared loadings all equal 0 >> # Now multiply each loading by its standard deviation >> sweep(pc$loadings, 2, pc$sdev, "*") > Loadings: > Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 > [1,] 0.765 0.275 -0.237 -0.531 > [2,] -0.787 0.427 0.271 -0.347 > [3,] 0.701 -0.167 0.497 -0.211 0.434 > [4,] -0.680 -0.518 -0.515 > [5,] -0.594 0.169 0.456 -0.627 -0.134 > > Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 > SS loadings 1.436 1.215 0.907 0.832 0.611 > Proportion Var 0.287 0.243 0.181 0.166 0.122 > Cumulative Var 0.287 0.530 0.712 0.878 1.000 >> pc$sdev^2 > Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 > 1.4362072 1.2145055 0.9068555 0.8315685 0.6108632 >> # Now the sum of the squared loadings equals the >> # squared standard deviation (aka the eigenvalues) > ------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Marcelo Kittlein > Sent: Monday, September 14, 2015 8:46 AM > To: r-help at r-project.org > Subject: [R] Error in principal component loadings calculation > > Hi all > > I have been using "princomp" to obtain the principal components of some > data and find that the loadings returned by the function appear to have > some error. > > in a simple example if a calculate de pc for a random matrix I get that > all loadings for the different components have the same proportion of > variance > > data <- matrix(runif(100), 20, 5) > pc <- princomp(data, cor=TRUE) > loadings(pc) > > Loadings: > Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 > [1,] -0.280 0.510 0.674 -0.217 -0.400 > [2,] 0.529 -0.353 -0.694 -0.330 > [3,] -0.111 0.563 -0.713 -0.336 -0.222 > [4,] -0.530 -0.502 -0.178 0.140 -0.645 > [5,] -0.590 -0.215 -0.582 0.516 > > Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 > SS loadings 1.0 1.0 1.0 1.0 1.0 > Proportion Var 0.2 0.2 0.2 0.2 0.2 > Cumulative Var 0.2 0.4 0.6 0.8 1.0 > > This keep returning the same proportion of variance for each component > regardless of the data used. > > my R version is > > > R.Version() > $platform > [1] "x86_64-unknown-linux-gnu" > > $arch > [1] "x86_64" > > $os > [1] "linux-gnu" > > $system > [1] "x86_64, linux-gnu" > > $status > [1] "" > > $major > [1] "3" > > $minor > [1] "2.1" > > $year > [1] "2015" > > $month > [1] "06" > > $day > [1] "18" > > $`svn rev` > [1] "68531" > > $language > [1] "R" > > $version.string > [1] "R version 3.2.1 (2015-06-18)" > > $nickname > [1] "World-Famous Astronaut" > > some hint would be much appreciated. > > Best regards > > Marcelo Kittlein > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >