Ashim Kapoor
2022-Oct-13 04:28 UTC
[R] prcomp - arbitrary direction of the returned principal components
Dear Aaron, Many thanks for your reply. Please allow me to illustrate my query a bit. I take some data, throw it to prcomp and extract the x data frame from prcomp.>From ?prcomp:x: if ?retx? is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the ?rotation? matrix) is returned. Hence, ?cov(x)? is the diagonal matrix ?diag(sdev^2)?. For the formula method, ?napredict()? is applied to handle the treatment of values omitted by the ?na.action?. I consider x[,1] as my index. This makes sense as x[,1] is the projection of the data on the FIRST principal component. Now this x[,1] can be a high +ve number or a low -ve number. I can't ignore the sign. If I ignore the sign by taking the absolute value, the HIGH / LOW stress values will be indistinguishable. Hence I do not think using absolute values of x[,1] is the solution. Yes it will make the results REPRODUCIBLE but that will be at the cost of losing information. Any other idea ? Many thanks, Ashim On Wed, Oct 12, 2022 at 5:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:> > Use absolute value > > Tim > > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim Kapoor > Sent: Wednesday, October 12, 2022 7:48 AM > To: R Help <r-help at r-project.org> > Subject: [R] prcomp - arbitrary direction of the returned principal components > > [External Email] > > Dear R experts, > > From ?prcomp, > > ---- snip ----- > Note: > > The signs of the columns of the rotation matrix are arbitrary, and > so may differ between different programs for PCA, and even between > different builds of R. > ---- snip ------ > > My problem is that I am building an index based on Principal Components Analysis. > When the index is high it should indicate stress in the market. Due to the arbitrary sign sometimes I get an index which is HIGH when there is stress and sometimes I get the OPPOSITE - an index which is LOW when there is stress. > This program is shared with other people who may have a different build of R. > > I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW. > That works. > > Now my query is : Just like we do set.seed(1234) and force the pattern of generation of random number and make it REPRODUCIBLE, can I do something like : > > set.direction.for.vector.in.pca(1234) > > Now each time I do prcomp it should choose the SAME ( high or low ) direction of the principle component on ANY computer having ANY version of R installed. > > That's what I want. I don't want the the returned principal component to be HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer. > That would confuse the people the code is shared with. > > Is this possible ? How do people deal with this ? > > Many thanks, > Ashim > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Jh00DHZnx%2FbRGgsdqkgEp7qcMzzqcjhxYfJGF1d13PI%3D&reserved=0 > PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C258ecdf67d1342e9785508daac47cdf3%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638011721656997427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=p%2BYrpIUZTD1msNJFsE34J1iLCt8yAPsCe334GKm%2BAtk%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code.
Chris Evans
2022-Oct-13 06:19 UTC
[R] prcomp - arbitrary direction of the returned principal components
I agree with this and I'm not very sure why you feel you need the signs fixed one way: they are arbitrary and how they come out is generally a function of things in the handling of rounding as values hit the limit of the finite arithmetic in the particular program and OS and hardware on which it's running.? Can't you just say that?! If you must have it aligned one way then I think the only thing you can do is to select the row/item/variable with the highest absolute loading on the component and set that to be positive (say, whether you choose positive or negative is up to you).? I think this does it: ### create some data to analyse set.seed(12345) n <- 500 # number of observations k <- 8 # number of variables fuzz <- .1 # used to add noise varLatent <- rnorm(n) # create the values of the dominant PC vecLoadings <- c(rep(1, k/2), rep(-1, k/2)) # binary loadings ### make the raw data matData <- matrix(rep(NA, n * k), ncol = k) for (i in 1:k) { ? matData[, i] <- vecLoadings[i] * varLatent + rnorm(n, fuzz) } head(matData) ### get the PCA matPrcomp <- prcomp(matData)$rotation round(matPrcomp, 2) # PC1?? PC2?? PC3?? PC4?? PC5?? PC6?? PC7?? PC8 # [1,]? 0.34? 0.00 -0.74? 0.32 -0.38 -0.25? 0.15? 0.00 # [2,]? 0.37 -0.19? 0.49? 0.33 -0.57? 0.38? 0.07 -0.01 # [3,]? 0.34? 0.80? 0.14? 0.11? 0.10 -0.05 -0.16 -0.43 # [4,]? 0.33 -0.47 -0.16? 0.10? 0.34? 0.17 -0.61 -0.35 # [5,] -0.35? 0.15 -0.15 -0.34 -0.61? 0.10 -0.57 -0.08 # [6,] -0.37 -0.25? 0.11? 0.13 -0.14 -0.30? 0.27 -0.77 # [7,] -0.35? 0.07? 0.17? 0.72? 0.04 -0.32 -0.38? 0.28 # [8,] -0.38? 0.17 -0.32? 0.33? 0.14? 0.75? 0.16 -0.12 ### find the sign of the maximum absolute loading for each component vecMaxItemSigns <- apply(matPrcomp, 2, function(x){sign(x[(which.max(abs(x)))])}) vecMaxItemSigns # PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 # -1?? 1? -1?? 1? -1?? 1? -1? -1 ### now use that to create a new PCA where the strongest loadings are always positive newMatPrcomp <- matrix(rep(NA, k * k), ncol = k) colnames(newMatPrcomp) <- colnames(matPrcomp) rownames(newMatPrcomp) <- rownames(matPrcomp) for (i in 1:k) { ? newMatPrcomp[, i] <- matPrcomp[, i] * vecMaxItems[i] } round(newMatPrcomp, 2) # PC1?? PC2?? PC3?? PC4?? PC5?? PC6?? PC7?? PC8 # [1,] -0.34? 0.00? 0.74? 0.32? 0.38 -0.25 -0.15? 0.00 # [2,] -0.37 -0.19 -0.49? 0.33? 0.57? 0.38 -0.07? 0.01 # [3,] -0.34? 0.80 -0.14? 0.11 -0.10 -0.05? 0.16? 0.43 # [4,] -0.33 -0.47? 0.16? 0.10 -0.34? 0.17? 0.61? 0.35 # [5,]? 0.35? 0.15? 0.15 -0.34? 0.61? 0.10? 0.57? 0.08 # [6,]? 0.37 -0.25 -0.11? 0.13? 0.14 -0.30 -0.27? 0.77 # [7,]? 0.35? 0.07 -0.17? 0.72 -0.04 -0.32? 0.38 -0.28 # [8,]? 0.38? 0.17? 0.32? 0.33 -0.14? 0.75 -0.16? 0.12 Apologies for the coding: I'm a better therapist than coder and it's a while since I've done much in base R like this.? Quite fun to get back to it!? R artistes can probably do that in four lines! But I'm not convinced doing this to "fix" the signs is really worth it however many lines one uses to code it!! Very best all, Chris ?On Thu, 2022-10-13 at 09:58 +0530, Ashim Kapoor wrote:> Dear Aaron, > > Many thanks for your reply. > > Please allow me to illustrate my query a bit. > > I take some data, throw it to prcomp and extract the x data frame > from prcomp. > > From ?prcomp: > > ?????? x: if ?retx? is true the value of the rotated data (the > centred > ????????? (and scaled if requested) data multiplied by the ?rotation? > ????????? matrix) is returned.? Hence, ?cov(x)? is the diagonal > matrix > ????????? ?diag(sdev^2)?.? For the formula method, ?napredict()? is > ????????? applied to handle the treatment of values omitted by the > ????????? ?na.action?. > > I consider x[,1] as my index. This makes sense as x[,1] is the > projection of the data on the FIRST principal component. > Now this x[,1] can be a high +ve number or a low -ve number. I can't > ignore the sign. > > If I ignore the sign by taking the absolute value, the HIGH / LOW > stress values will be indistinguishable. > > Hence I do not think using absolute values of x[,1] is the solution. > Yes it will make the results REPRODUCIBLE but that will be at the > cost > of losing information. > > Any other idea ? > > Many thanks, > Ashim > > On Wed, Oct 12, 2022 at 5:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> > wrote: > > > > Use absolute value > > > > Tim > > > > -----Original Message----- > > From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim > > Kapoor > > Sent: Wednesday, October 12, 2022 7:48 AM > > To: R Help <r-help at r-project.org> > > Subject: [R] prcomp - arbitrary direction of the returned principal > > components > > > > [External Email] > > > > Dear R experts, > > > > From ?prcomp, > > > > ---- snip ----- > > Note: > > > > ???? The signs of the columns of the rotation matrix are arbitrary, > > and > > ???? so may differ between different programs for PCA, and even > > between > > ???? different builds of R. > > ---- snip ------ > > > > My problem is that I am building an index based on Principal > > Components Analysis. > > When the index is high it should indicate stress in the market. Due > > to the arbitrary sign sometimes I get an index which is HIGH when > > there is stress and sometimes I get? the OPPOSITE - an index which > > is LOW when there is stress. > > This program is shared with other people who may have a different > > build of R. > > > > I can forcefully use a NEGATIVE sign to FLIP the index when it is > > LOW. > > That works. > > > > Now my query is : Just like we do set.seed(1234) and force the > > pattern of generation of random number and make it REPRODUCIBLE, > > can I do something like : > > > > set.direction.for.vector.in.pca(1234) > > > > Now each time I do prcomp it should choose the SAME ( high or low ) > > direction of the principle component on ANY computer having ANY > > version of R installed. > > > > That's what I want. I don't want the the returned principal > > component to be HIGH(LOW) on my computer and LOW(HIGH) on someone > > else's computer. > > That would confuse the people the code is shared with. > > > > Is this possible ? How do people deal with this ? > > > > Many thanks, > > Ashim > > > > > > ______________________________________________ > R-help at r-project.org?mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chris Evans (he/him)? ? ? ? <chris at psyctc.org> ? ?? Visiting Professor,?UDLA, Quito, Ecuador?& Honorary Professor,?University of Roehampton, London, UK. Work web site:? ? ? ? ? ? ? ? ??https://www.psyctc.org/psyctc/? CORE site:? ? ? ? ? ? ? ? ? ? ? ??http://www.coresystemtrust.org.uk/? Personal site:? ? ? ? ? ? ? ? ? ??https://www.psyctc.org/pelerinage2016/? Emeetings (Thursdays): ?https://link.psyctc.org/booking?(Beware: French time, generally an hour ahead of UK) [[alternative HTML version deleted]]
Ebert,Timothy Aaron
2022-Oct-13 12:03 UTC
[R] prcomp - arbitrary direction of the returned principal components
I still do not understand. However, the general approach would be to identify a specific value to test. If the test is TRUE then do "this" otherwise do nothing. Once the test condition is properly identified, the coding easily follows. abs() is the same as if x<0 then x = -x (non-R code, just idea) The R code might look something more like for (number in 1:ncol(x)){ if (x[3,2] < 0) { x[number, number] = -x[number, number] #only change the diagonal } } Depending on what values need to be changed you may need a nested for loop to go through all values of x[number1, number2]. Your words: " I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW." Where it appeared that "low" was defined as values that are negative. You still will have low values (close to zero) and high values (far from zero). You could make the condition some other value: if x< -4 then x = -x If you just want to rotate about zero then x = -x In this case the positive values will become negative and the negative values positive. Add an if test to selectively rotate based on the value of a single test element in x (as in x[3,2]). In debugging or trouble shooting setting seed is useful. For actual data analysis you should not set seed, or possibly better yet use set.seed(NULL). Tim -----Original Message----- From: Ashim Kapoor <ashimkapoor at gmail.com> Sent: Thursday, October 13, 2022 12:28 AM To: Ebert,Timothy Aaron <tebert at ufl.edu> Cc: R Help <r-help at r-project.org> Subject: Re: [R] prcomp - arbitrary direction of the returned principal components [External Email] Dear Aaron, Many thanks for your reply. Please allow me to illustrate my query a bit. I take some data, throw it to prcomp and extract the x data frame from prcomp.>From ?prcomp:x: if 'retx' is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the 'rotation' matrix) is returned. Hence, 'cov(x)' is the diagonal matrix 'diag(sdev^2)'. For the formula method, 'napredict()' is applied to handle the treatment of values omitted by the 'na.action'. I consider x[,1] as my index. This makes sense as x[,1] is the projection of the data on the FIRST principal component. Now this x[,1] can be a high +ve number or a low -ve number. I can't ignore the sign. If I ignore the sign by taking the absolute value, the HIGH / LOW stress values will be indistinguishable. Hence I do not think using absolute values of x[,1] is the solution. Yes it will make the results REPRODUCIBLE but that will be at the cost of losing information. Any other idea ? Many thanks, Ashim On Wed, Oct 12, 2022 at 5:23 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:> > Use absolute value > > Tim > > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Ashim Kapoor > Sent: Wednesday, October 12, 2022 7:48 AM > To: R Help <r-help at r-project.org> > Subject: [R] prcomp - arbitrary direction of the returned principal > components > > [External Email] > > Dear R experts, > > From ?prcomp, > > ---- snip ----- > Note: > > The signs of the columns of the rotation matrix are arbitrary, and > so may differ between different programs for PCA, and even between > different builds of R. > ---- snip ------ > > My problem is that I am building an index based on Principal Components Analysis. > When the index is high it should indicate stress in the market. Due to the arbitrary sign sometimes I get an index which is HIGH when there is stress and sometimes I get the OPPOSITE - an index which is LOW when there is stress. > This program is shared with other people who may have a different build of R. > > I can forcefully use a NEGATIVE sign to FLIP the index when it is LOW. > That works. > > Now my query is : Just like we do set.seed(1234) and force the pattern of generation of random number and make it REPRODUCIBLE, can I do something like : > > set.direction.for.vector.in.pca(1234) > > Now each time I do prcomp it should choose the SAME ( high or low ) direction of the principle component on ANY computer having ANY version of R installed. > > That's what I want. I don't want the the returned principal component to be HIGH(LOW) on my computer and LOW(HIGH) on someone else's computer. > That would confuse the people the code is shared with. > > Is this possible ? How do people deal with this ? > > Many thanks, > Ashim > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl > .edu%7C60e6d6ae8645462db99b08daacd36b76%7C0d4da0f84a314d76ace60a62331e > 1b84%7C0%7C0%7C638012321302591064%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w > LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C > &sdata=AHMEDU%2BTyInvW%2FH6EZQteO1qZ%2BtW3JZfybfaveTD8Yk%3D&re > served=0 PLEASE do read the posting guide > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r > -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu% > 7C60e6d6ae8645462db99b08daacd36b76%7C0d4da0f84a314d76ace60a62331e1b84% > 7C0%7C0%7C638012321302591064%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM > DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C& > sdata=yavXAiQorhZjPTozG4Ulo8SuNmR6XFhvA%2FLX9Tfwgi0%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code.