Hi all. I'm newbie in PCA by I don't understand a behaviour of R. I have this data matrix:>mx_fusheight diam hole weight 1 2.3 3.5 1.1 18 2 2.0 3.5 0.9 17 3 3.8 4.3 0.7 34 4 2.1 3.4 0.9 15 5 2.3 3.8 1.0 19 6 2.2 3.8 1.0 19 7 3.2 4.4 0.9 34 8 3.0 4.3 1.0 30 9 2.8 3.9 0.9 21 10 3.3 4.2 1.1 33 11 2.3 3.9 0.9 25 12 2.3 3.3 0.5 17 13 0.9 2.4 0.4 10 14 1.4 2.4 0.5 10 15 2.2 3.6 0.7 22 16 2.9 3.8 0.8 30 17 2.9 3.5 0.6 27 18 2.3 3.5 0.5 24 19 1.8 2.3 0.5 29 20 1.4 2.5 0.6 34 21 0.8 2.3 0.6 21 22 1.8 2.4 0.6 23 23 1.5 2.2 0.6 7 24 0.9 1.7 0.4 14 25 2.1 2.2 0.5 25 26 1.3 2.4 0.6 33 27 1.3 2.7 0.4 39 28 0.5 2.2 0.5 13 29 1.4 4.2 0.8 23 30 1.6 2.0 0.4 30 31 1.4 2.2 0.6 25 32 1.8 2.5 0.6 28 33 1.4 2.6 0.6 41 34 1.6 2.3 0.3 32 35 1.6 2.5 0.5 41 36 2.8 2.9 0.8 47 37 0.6 2.5 0.8 21 38 1.6 2.8 0.7 13 39 1.7 3.3 0.8 17 40 1.6 3.9 1.9 20 41 1.4 4.7 0.9 26 42 1.2 4.2 0.7 21 43 3.5 4.2 0.9 47 44 2.3 3.6 0.7 24 45 2.3 3.4 0.4 21 46 1.9 2.6 0.7 14 47 1.9 3.0 0.7 15 48 2.7 3.7 0.9 26 49 3.0 3.8 0.7 35 50 1.2 2.0 0.7 5 51 1.6 2.5 0.5 15 52 1.3 2.6 0.5 16 53 2.5 3.9 0.9 32 54 0.9 3.3 0.6 9 55 1.8 2.4 0.5 17 56 2.4 3.7 1.1 30 57 2.1 3.5 1.1 22 58 2.6 3.9 1.0 38 59 2.6 3.6 1.0 27 60 2.6 4.1 1.0 34 61 2.9 3.6 0.8 32 62 2.6 3.3 0.7 22 63 1.8 2.5 0.7 26 64 3.0 2.8 1.3 2 65 0.5 2.2 0.4 3 66 1.9 3.4 0.7 14 67 1.4 3.8 0.9 18 68 2.0 4.0 1.0 30 69 3.1 4.0 1.3 21 70 2.5 4.0 0.8 19 71 2.5 4.5 1.0 20 72 1.8 3.5 1.4 18 73 2.1 3.5 1.4 25 74 1.5 2.6 0.5 9 75 2.8 3.2 1.2 16 76 1.0 5.0 0.3 32 77 0.3 5.8 0.5 56 78 0.5 1.5 0.2 1 79 0.7 1.4 0.2 1 80 0.5 1.3 0.2 1 81 0.7 3.3 0.4 7 82 1.9 4.7 1.0 24 83 3.1 4.2 0.9 49 84 2.8 3.6 0.7 28 85 2.7 3.2 0.7 29 86 3.0 4.0 0.9 36 87 1.7 2.7 0.7 14 88 1.5 2.9 0.7 18 89 2.9 3.5 0.7 30 90 3.0 3.4 0.8 30 91 2.0 2.8 0.5 14 92 2.4 3.5 0.7 24 93 0.8 4.1 0.6 12 94 1.7 2.5 0.5 23 95 1.4 2.4 0.8 31 96 1.5 2.7 0.4 20 97 2.6 3.7 0.6 31 98 2.6 3.0 0.6 18 99 2.5 5.0 0.7 40 100 2.5 3.7 0.5 30 101 2.4 2.9 0.7 17 102 2.3 3.0 0.5 15 103 2.2 3.3 0.6 19 104 1.5 2.1 0.5 5 105 2.0 2.2 0.5 10 106 2.6 3.5 0.6 26 107 2.3 3.0 0.6 15 108 2.5 4.5 0.7 40 109 2.1 3.1 0.5 15 110 1.3 2.1 0.8 14 111 0.8 2.5 0.2 5 112 0.6 3.1 0.7 8 I perform a PCA in R>pca<-prcomp(mx_fus,scale=TRUE) >biplot(pca, choices = c(1,2), cex=0.7)The biplot put the arrows of diam and height very near on the first component axis. So I understand that these 2 variables are well represented in the PC1 and they are correlated each other. But if I test the correlation, the value o correlation coefficient is low>cor(mx_fus[,1],mx_fus[,2])0.4828185 Why the plot says a thing and correlation function says the opposite? Two near arrows don't represent a strong correlation between the 2 variables (as I read in some manuals), but only with the component axis? Than's in advance [[alternative HTML version deleted]]
This is more a question about principal components analysis than about R. You have 4 variables and they are moderately correlated with one another (weight and hole are only .2). When the data consist of measurements, this usually suggests that the overall size of the object is being partly measured by each variable. In your case object size is measured by the first principle component (PC1) with larger objects having more negative scores so larger objects are on the left and smaller ones are on the right of the biplot. The biplot can only display 2 of the 4 dimensions of your data at one time. In the first 2 dimensions, diam and height are close together, but in the 3rd dimension (PC3), they are on opposite sides of the component. If you plot different pairs of dimensions (e.g. 1 with 3 or 2 with 3, see below), the arrows will look different because you are looking from different directions.> pcaStandard deviations: [1] 1.5264292 0.8950379 0.7233671 0.5879295 Rotation: PC1 PC2 PC3 PC4 height -0.5210224 -0.06545193 0.80018012 -0.2897646 diam -0.5473677 0.06309163 -0.57146893 -0.6081376 hole -0.4598646 -0.70952862 -0.17476677 0.5045297 weight -0.4663141 0.69878797 -0.05090785 0.5400508> biplot(pca, choices=c(1, 3)) > biplot(pca, choices=c(2, 3))------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Denis Francisci Sent: Friday, March 10, 2017 4:45 AM To: R-help Mailing List <r-help at r-project.org> Subject: [R] problem with PCA Hi all. I'm newbie in PCA by I don't understand a behaviour of R. I have this data matrix:>mx_fusheight diam hole weight 1 2.3 3.5 1.1 18 2 2.0 3.5 0.9 17 3 3.8 4.3 0.7 34 4 2.1 3.4 0.9 15 5 2.3 3.8 1.0 19 6 2.2 3.8 1.0 19 7 3.2 4.4 0.9 34 8 3.0 4.3 1.0 30 9 2.8 3.9 0.9 21 10 3.3 4.2 1.1 33 11 2.3 3.9 0.9 25 12 2.3 3.3 0.5 17 13 0.9 2.4 0.4 10 14 1.4 2.4 0.5 10 15 2.2 3.6 0.7 22 16 2.9 3.8 0.8 30 17 2.9 3.5 0.6 27 18 2.3 3.5 0.5 24 19 1.8 2.3 0.5 29 20 1.4 2.5 0.6 34 21 0.8 2.3 0.6 21 22 1.8 2.4 0.6 23 23 1.5 2.2 0.6 7 24 0.9 1.7 0.4 14 25 2.1 2.2 0.5 25 26 1.3 2.4 0.6 33 27 1.3 2.7 0.4 39 28 0.5 2.2 0.5 13 29 1.4 4.2 0.8 23 30 1.6 2.0 0.4 30 31 1.4 2.2 0.6 25 32 1.8 2.5 0.6 28 33 1.4 2.6 0.6 41 34 1.6 2.3 0.3 32 35 1.6 2.5 0.5 41 36 2.8 2.9 0.8 47 37 0.6 2.5 0.8 21 38 1.6 2.8 0.7 13 39 1.7 3.3 0.8 17 40 1.6 3.9 1.9 20 41 1.4 4.7 0.9 26 42 1.2 4.2 0.7 21 43 3.5 4.2 0.9 47 44 2.3 3.6 0.7 24 45 2.3 3.4 0.4 21 46 1.9 2.6 0.7 14 47 1.9 3.0 0.7 15 48 2.7 3.7 0.9 26 49 3.0 3.8 0.7 35 50 1.2 2.0 0.7 5 51 1.6 2.5 0.5 15 52 1.3 2.6 0.5 16 53 2.5 3.9 0.9 32 54 0.9 3.3 0.6 9 55 1.8 2.4 0.5 17 56 2.4 3.7 1.1 30 57 2.1 3.5 1.1 22 58 2.6 3.9 1.0 38 59 2.6 3.6 1.0 27 60 2.6 4.1 1.0 34 61 2.9 3.6 0.8 32 62 2.6 3.3 0.7 22 63 1.8 2.5 0.7 26 64 3.0 2.8 1.3 2 65 0.5 2.2 0.4 3 66 1.9 3.4 0.7 14 67 1.4 3.8 0.9 18 68 2.0 4.0 1.0 30 69 3.1 4.0 1.3 21 70 2.5 4.0 0.8 19 71 2.5 4.5 1.0 20 72 1.8 3.5 1.4 18 73 2.1 3.5 1.4 25 74 1.5 2.6 0.5 9 75 2.8 3.2 1.2 16 76 1.0 5.0 0.3 32 77 0.3 5.8 0.5 56 78 0.5 1.5 0.2 1 79 0.7 1.4 0.2 1 80 0.5 1.3 0.2 1 81 0.7 3.3 0.4 7 82 1.9 4.7 1.0 24 83 3.1 4.2 0.9 49 84 2.8 3.6 0.7 28 85 2.7 3.2 0.7 29 86 3.0 4.0 0.9 36 87 1.7 2.7 0.7 14 88 1.5 2.9 0.7 18 89 2.9 3.5 0.7 30 90 3.0 3.4 0.8 30 91 2.0 2.8 0.5 14 92 2.4 3.5 0.7 24 93 0.8 4.1 0.6 12 94 1.7 2.5 0.5 23 95 1.4 2.4 0.8 31 96 1.5 2.7 0.4 20 97 2.6 3.7 0.6 31 98 2.6 3.0 0.6 18 99 2.5 5.0 0.7 40 100 2.5 3.7 0.5 30 101 2.4 2.9 0.7 17 102 2.3 3.0 0.5 15 103 2.2 3.3 0.6 19 104 1.5 2.1 0.5 5 105 2.0 2.2 0.5 10 106 2.6 3.5 0.6 26 107 2.3 3.0 0.6 15 108 2.5 4.5 0.7 40 109 2.1 3.1 0.5 15 110 1.3 2.1 0.8 14 111 0.8 2.5 0.2 5 112 0.6 3.1 0.7 8 I perform a PCA in R>pca<-prcomp(mx_fus,scale=TRUE) >biplot(pca, choices = c(1,2), cex=0.7)The biplot put the arrows of diam and height very near on the first component axis. So I understand that these 2 variables are well represented in the PC1 and they are correlated each other. But if I test the correlation, the value o correlation coefficient is low>cor(mx_fus[,1],mx_fus[,2])0.4828185 Why the plot says a thing and correlation function says the opposite? Two near arrows don't represent a strong correlation between the 2 variables (as I read in some manuals), but only with the component axis? Than's in advance [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you David for your answer. If I understood the relative positions of variable arrows don't reflect the coefficient of correlation of the original variables. In fact these positions change if I use different PC axes. But in some manual about PCA in R I read: "Pairs of variables that form acute angles at the origin, close to 0?, should be highly and positively correlated; variables close to right angles tend to have low correlation; variables at obtuse angles, close to 180?, tend to have high negative correlation". And If I do a fictional test, it seems true: tb<-data.frame( c(1,2,3,4,5,6,7,8,9), #orig data c(2,4,5,8,10,12,14,16,18),#strong positive correlation c(25,29,52,63,110,111,148,161,300),#weakly correlation c(-1,-2,-3,-4,-5,-6,-7,-8,-9),#strong negative correlation c(3,8,4,6,1,3,2,5,7)#not correlation ) names(tb)<-c("orig","corr+","corr+2","corr-","random") pca<-prcomp(as.matrix(tb),scale=T) biplot(pca,choices = c(1,2)) On the first 2 PC the positions of arrows reflect perfectly the original correlations. My data behaviour differently, maybe because my original variables are not strong correlated? 2017-03-10 15:49 GMT+01:00 David L Carlson <dcarlson at tamu.edu>:> This is more a question about principal components analysis than about R. > You have 4 variables and they are moderately correlated with one another > (weight and hole are only .2). When the data consist of measurements, this > usually suggests that the overall size of the object is being partly > measured by each variable. In your case object size is measured by the > first principle component (PC1) with larger objects having more negative > scores so larger objects are on the left and smaller ones are on the right > of the biplot. > > The biplot can only display 2 of the 4 dimensions of your data at one > time. In the first 2 dimensions, diam and height are close together, but in > the 3rd dimension (PC3), they are on opposite sides of the component. If > you plot different pairs of dimensions (e.g. 1 with 3 or 2 with 3, see > below), the arrows will look different because you are looking from > different directions. > > > pca > Standard deviations: > [1] 1.5264292 0.8950379 0.7233671 0.5879295 > > Rotation: > PC1 PC2 PC3 PC4 > height -0.5210224 -0.06545193 0.80018012 -0.2897646 > diam -0.5473677 0.06309163 -0.57146893 -0.6081376 > hole -0.4598646 -0.70952862 -0.17476677 0.5045297 > weight -0.4663141 0.69878797 -0.05090785 0.5400508 > > > biplot(pca, choices=c(1, 3)) > > biplot(pca, choices=c(2, 3)) > > ------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Denis > Francisci > Sent: Friday, March 10, 2017 4:45 AM > To: R-help Mailing List <r-help at r-project.org> > Subject: [R] problem with PCA > > Hi all. > I'm newbie in PCA by I don't understand a behaviour of R. > I have this data matrix: > > >mx_fus > height diam hole weight > 1 2.3 3.5 1.1 18 > 2 2.0 3.5 0.9 17 > 3 3.8 4.3 0.7 34 > 4 2.1 3.4 0.9 15 > 5 2.3 3.8 1.0 19 > 6 2.2 3.8 1.0 19 > 7 3.2 4.4 0.9 34 > 8 3.0 4.3 1.0 30 > 9 2.8 3.9 0.9 21 > 10 3.3 4.2 1.1 33 > 11 2.3 3.9 0.9 25 > 12 2.3 3.3 0.5 17 > 13 0.9 2.4 0.4 10 > 14 1.4 2.4 0.5 10 > 15 2.2 3.6 0.7 22 > 16 2.9 3.8 0.8 30 > 17 2.9 3.5 0.6 27 > 18 2.3 3.5 0.5 24 > 19 1.8 2.3 0.5 29 > 20 1.4 2.5 0.6 34 > 21 0.8 2.3 0.6 21 > 22 1.8 2.4 0.6 23 > 23 1.5 2.2 0.6 7 > 24 0.9 1.7 0.4 14 > 25 2.1 2.2 0.5 25 > 26 1.3 2.4 0.6 33 > 27 1.3 2.7 0.4 39 > 28 0.5 2.2 0.5 13 > 29 1.4 4.2 0.8 23 > 30 1.6 2.0 0.4 30 > 31 1.4 2.2 0.6 25 > 32 1.8 2.5 0.6 28 > 33 1.4 2.6 0.6 41 > 34 1.6 2.3 0.3 32 > 35 1.6 2.5 0.5 41 > 36 2.8 2.9 0.8 47 > 37 0.6 2.5 0.8 21 > 38 1.6 2.8 0.7 13 > 39 1.7 3.3 0.8 17 > 40 1.6 3.9 1.9 20 > 41 1.4 4.7 0.9 26 > 42 1.2 4.2 0.7 21 > 43 3.5 4.2 0.9 47 > 44 2.3 3.6 0.7 24 > 45 2.3 3.4 0.4 21 > 46 1.9 2.6 0.7 14 > 47 1.9 3.0 0.7 15 > 48 2.7 3.7 0.9 26 > 49 3.0 3.8 0.7 35 > 50 1.2 2.0 0.7 5 > 51 1.6 2.5 0.5 15 > 52 1.3 2.6 0.5 16 > 53 2.5 3.9 0.9 32 > 54 0.9 3.3 0.6 9 > 55 1.8 2.4 0.5 17 > 56 2.4 3.7 1.1 30 > 57 2.1 3.5 1.1 22 > 58 2.6 3.9 1.0 38 > 59 2.6 3.6 1.0 27 > 60 2.6 4.1 1.0 34 > 61 2.9 3.6 0.8 32 > 62 2.6 3.3 0.7 22 > 63 1.8 2.5 0.7 26 > 64 3.0 2.8 1.3 2 > 65 0.5 2.2 0.4 3 > 66 1.9 3.4 0.7 14 > 67 1.4 3.8 0.9 18 > 68 2.0 4.0 1.0 30 > 69 3.1 4.0 1.3 21 > 70 2.5 4.0 0.8 19 > 71 2.5 4.5 1.0 20 > 72 1.8 3.5 1.4 18 > 73 2.1 3.5 1.4 25 > 74 1.5 2.6 0.5 9 > 75 2.8 3.2 1.2 16 > 76 1.0 5.0 0.3 32 > 77 0.3 5.8 0.5 56 > 78 0.5 1.5 0.2 1 > 79 0.7 1.4 0.2 1 > 80 0.5 1.3 0.2 1 > 81 0.7 3.3 0.4 7 > 82 1.9 4.7 1.0 24 > 83 3.1 4.2 0.9 49 > 84 2.8 3.6 0.7 28 > 85 2.7 3.2 0.7 29 > 86 3.0 4.0 0.9 36 > 87 1.7 2.7 0.7 14 > 88 1.5 2.9 0.7 18 > 89 2.9 3.5 0.7 30 > 90 3.0 3.4 0.8 30 > 91 2.0 2.8 0.5 14 > 92 2.4 3.5 0.7 24 > 93 0.8 4.1 0.6 12 > 94 1.7 2.5 0.5 23 > 95 1.4 2.4 0.8 31 > 96 1.5 2.7 0.4 20 > 97 2.6 3.7 0.6 31 > 98 2.6 3.0 0.6 18 > 99 2.5 5.0 0.7 40 > 100 2.5 3.7 0.5 30 > 101 2.4 2.9 0.7 17 > 102 2.3 3.0 0.5 15 > 103 2.2 3.3 0.6 19 > 104 1.5 2.1 0.5 5 > 105 2.0 2.2 0.5 10 > 106 2.6 3.5 0.6 26 > 107 2.3 3.0 0.6 15 > 108 2.5 4.5 0.7 40 > 109 2.1 3.1 0.5 15 > 110 1.3 2.1 0.8 14 > 111 0.8 2.5 0.2 5 > 112 0.6 3.1 0.7 8 > > I perform a PCA in R > > >pca<-prcomp(mx_fus,scale=TRUE) > >biplot(pca, choices = c(1,2), cex=0.7) > > The biplot put the arrows of diam and height very near on the first > component axis. > So I understand that these 2 variables are well represented in the PC1 and > they are correlated each other. > But if I test the correlation, the value o correlation coefficient is low > > >cor(mx_fus[,1],mx_fus[,2]) > 0.4828185 > > Why the plot says a thing and correlation function says the opposite? > Two near arrows don't represent a strong correlation between the 2 > variables (as I read in some manuals), but only with the component axis? > > Than's in advance > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]