Ebert,Timothy Aaron
2022-Oct-12 11:53 UTC
[R] prcomp - arbitrary direction of the returned principal components
Use absolute value Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim Kapoor Sent: Wednesday, October 12, 2022 7:48 AM To: R Help <r-help at r-project.org> Subject: [R] prcomp - arbitrary direction of the returned principal components [External Email] Dear R experts,>From ?prcomp,---- snip ----- Note: The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R. ---- snip ------ My problem is that I am building an index based on Principal Components Analysis. When the index is high it should indicate stress in the market. Due to the arbitrary sign sometimes I get an index which is HIGH when there is stress and sometimes I get the OPPOSITE - an index which is LOW when there is stress. This program is shared with other people who may have a different build of R. I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW. That works. Now my query is : Just like we do set.seed(1234) and force the pattern of generation of random number and make it REPRODUCIBLE, can I do something like : set.direction.for.vector.in.pca(1234) Now each time I do prcomp it should choose the SAME ( high or low ) direction of the principle component on ANY computer having ANY version of R installed. That's what I want. I don't want the the returned principal component to be HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer. That would confuse the people the code is shared with. Is this possible ? How do people deal with this ? Many thanks, Ashim ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see nam10.safelinks.protection.outlook.com/?url=https://stat.ethz.ch/mailman/listinfo/r-help&data=05|01|tebert@ufl.edu|258ecdf67d1342e9785508daac47cdf3|0d4da0f84a314d76ace60a62331e1b84|0|0|638011721656997427|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000|||&sdata=Jh00DHZnx/bRGgsdqkgEp7qcMzzqcjhxYfJGF1d13PI=&reserved=0 PLEASE do read the posting guide nam10.safelinks.protection.outlook.com/?url=http://www.r-project.org/posting-guide.html&data=05|01|tebert@ufl.edu|258ecdf67d1342e9785508daac47cdf3|0d4da0f84a314d76ace60a62331e1b84|0|0|638011721656997427|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000|||&sdata=p+YrpIUZTD1msNJFsE34J1iLCt8yAPsCe334GKm+Atk=&reserved=0 and provide commented, minimal, self-contained, reproducible code.
Ashim Kapoor
2022-Oct-13 04:28 UTC
[R] prcomp - arbitrary direction of the returned principal components
Dear Aaron, Many thanks for your reply. Please allow me to illustrate my query a bit. I take some data, throw it to prcomp and extract the x data frame from prcomp.>From ?prcomp:x: if ?retx? is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the ?rotation? matrix) is returned. Hence, ?cov(x)? is the diagonal matrix ?diag(sdev^2)?. For the formula method, ?napredict()? is applied to handle the treatment of values omitted by the ?na.action?. I consider x[,1] as my index. This makes sense as x[,1] is the projection of the data on the FIRST principal component. Now this x[,1] can be a high +ve number or a low -ve number. I can't ignore the sign. If I ignore the sign by taking the absolute value, the HIGH / LOW stress values will be indistinguishable. Hence I do not think using absolute values of x[,1] is the solution. Yes it will make the results REPRODUCIBLE but that will be at the cost of losing information. Any other idea ? Many thanks, Ashim On Wed, Oct 12, 2022 at 5:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:> > Use absolute value > > Tim > > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim Kapoor > Sent: Wednesday, October 12, 2022 7:48 AM > To: R Help <r-help at r-project.org> > Subject: [R] prcomp - arbitrary direction of the returned principal components > > [External Email] > > Dear R experts, > > From ?prcomp, > > ---- snip ----- > Note: > > The signs of the columns of the rotation matrix are arbitrary, and > so may differ between different programs for PCA, and even between > different builds of R. > ---- snip ------ > > My problem is that I am building an index based on Principal Components Analysis. > When the index is high it should indicate stress in the market. Due to the arbitrary sign sometimes I get an index which is HIGH when there is stress and sometimes I get the OPPOSITE - an index which is LOW when there is stress. > This program is shared with other people who may have a different build of R. > > I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW. > That works. > > Now my query is : Just like we do set.seed(1234) and force the pattern of generation of random number and make it REPRODUCIBLE, can I do something like : > > set.direction.for.vector.in.pca(1234) > > Now each time I do prcomp it should choose the SAME ( high or low ) direction of the principle component on ANY computer having ANY version of R installed. > > That's what I want. I don't want the the returned principal component to be HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer. > That would confuse the people the code is shared with. > > Is this possible ? How do people deal with this ? > > Many thanks, > Ashim > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > nam10.safelinks.protection.outlook.com/?url=https://stat.ethz.ch/mailman/listinfo/r-help&data=05|01|tebert@ufl.edu|258ecdf67d1342e9785508daac47cdf3|0d4da0f84a314d76ace60a62331e1b84|0|0|638011721656997427|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000|||&sdata=Jh00DHZnx/bRGgsdqkgEp7qcMzzqcjhxYfJGF1d13PI=&reserved=0 > PLEASE do read the posting guide nam10.safelinks.protection.outlook.com/?url=http://www.r-project.org/posting-guide.html&data=05|01|tebert@ufl.edu|258ecdf67d1342e9785508daac47cdf3|0d4da0f84a314d76ace60a62331e1b84|0|0|638011721656997427|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000|||&sdata=p+YrpIUZTD1msNJFsE34J1iLCt8yAPsCe334GKm+Atk=&reserved=0 > and provide commented, minimal, self-contained, reproducible code.