Ashim Kapoor
2022-Oct-12 11:48 UTC
[R] prcomp - arbitrary direction of the returned principal components
Dear R experts,>From ?prcomp,---- snip ----- Note: The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R. ---- snip ------ My problem is that I am building an index based on Principal Components Analysis. When the index is high it should indicate stress in the market. Due to the arbitrary sign sometimes I get an index which is HIGH when there is stress and sometimes I get the OPPOSITE - an index which is LOW when there is stress. This program is shared with other people who may have a different build of R. I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW. That works. Now my query is : Just like we do set.seed(1234) and force the pattern of generation of random number and make it REPRODUCIBLE, can I do something like : set.direction.for.vector.in.pca(1234) Now each time I do prcomp it should choose the SAME ( high or low ) direction of the principle component on ANY computer having ANY version of R installed. That's what I want. I don't want the the returned principal component to be HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer. That would confuse the people the code is shared with. Is this possible ? How do people deal with this ? Many thanks, Ashim
Ebert,Timothy Aaron
2022-Oct-12 11:53 UTC
[R] prcomp - arbitrary direction of the returned principal components
Use absolute value Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim Kapoor Sent: Wednesday, October 12, 2022 7:48 AM To: R Help <r-help at r-project.org> Subject: [R] prcomp - arbitrary direction of the returned principal components [External Email] Dear R experts,>From ?prcomp,---- snip ----- Note: The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R. ---- snip ------ My problem is that I am building an index based on Principal Components Analysis. When the index is high it should indicate stress in the market. Due to the arbitrary sign sometimes I get an index which is HIGH when there is stress and sometimes I get the OPPOSITE - an index which is LOW when there is stress. This program is shared with other people who may have a different build of R. I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW. That works. Now my query is : Just like we do set.seed(1234) and force the pattern of generation of random number and make it REPRODUCIBLE, can I do something like : set.direction.for.vector.in.pca(1234) Now each time I do prcomp it should choose the SAME ( high or low ) direction of the principle component on ANY computer having ANY version of R installed. That's what I want. I don't want the the returned principal component to be HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer. That would confuse the people the code is shared with. Is this possible ? How do people deal with this ? Many thanks, Ashim ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Jh00DHZnx%2FbRGgsdqkgEp7qcMzzqcjhxYfJGF1d13PI%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=p%2BYrpIUZTD1msNJFsE34J1iLCt8yAPsCe334GKm%2BAtk%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.
Ivan Krylov
2022-Oct-13 10:04 UTC
[R] prcomp - arbitrary direction of the returned principal components
? Wed, 12 Oct 2022 17:18:26 +0530 Ashim Kapoor <ashimkapoor at gmail.com> ?????:> My problem is that I am building an index based on Principal > Components Analysis. > When the index is high it should indicate stress in the market.Have you considered using supervised methods, like PLS, to predict stress in the market? Imagine what happens when you take the points where there's stress in the market and feed only those to PCA. The first principal direction is now gone (if there is a variation along this axis, it's much smaller than it was), so now some other direction occupies its place. Even if the first direction is preserved, after centering, there are now points with low values of PC1, despite all points should correspond to stress in the market. Apologies if the paragraph above is complete nonsense, a reasonable researched would always conduct the analysis on a representative sample of the points, and the whole point of the proposed index is that high stress is indicated by points on the positive end of the multivariate sausage that PCA considers the data to be. If that's the case, post-processing the signs as described by Chris Evans could be the right solution. -- Best regards, Ivan