tolga.i.uzuner at jpmorgan.com
2008-Aug-13 16:33 UTC
[R] which alternative tests instead of AIC/BIC for choosing models
Dear R Users, I am looking for an alternative to AIC or BIC to choose model parameters. This is somewhat of a general statistics question, but I ask it in this forum as I am looking for a R solution. Suppose I have one dependent variable, y, and two independent variables, x1 an x2. I can perform three regressions: reg1: y~x1 reg2: y~x2 reg3: y~x1+x2 The AIC of reg1 is 2000, reg2 is 1000 and reg3 is 950. One would, presumably, conclude that one should use both x1 and x2. However, the R^2's are quite different: R^2 of reg1 is 0.5%, reg2 is 95% and reg3 is 95.25%. Knowing that, I would actually conclude that x1 adds litte and should probably not be used. There is the overall question of what potentially explains this outcome, i.e. the reduction in AIC in going from reg2 to reg3 even though R^2 does not materially improve with the addition of x1 to reg 2 (to get to reg3). But that is more of a generic statistics issue and not my question here. The question I do have is, is there a package in R which implements a test and provides some diagnostic information I can use to rule out the use of x1 in a systematic way as it's addition to the equation adds little in terms of explaining the variability of y. Thanks in advance, Tolga Generally, this communication is for informational purposes only and it is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. In the event you are receiving the offering materials attached below related to your interest in hedge funds or private equity, this communication may be intended as an offer or solicitation for the purchase or sale of such fund(s). All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to UK legal entities. [[alternative HTML version deleted]]
John C Frain
2008-Aug-13 17:51 UTC
[R] which alternative tests instead of AIC/BIC for choosing models
My initial idea would be to forget about AIC and BIC, ask the question what would one expect to get in the regression and then regress y on x1 and x2 and use a simple t-test to determine what should be included. Remember that omitted variables will bias your coefficients but if you include redundant variables your results will remain consistent. I presume that you do not have any problems with non-stationary variables. Best Regards John 2008/8/13 <tolga.i.uzuner at jpmorgan.com>:> Dear R Users, > > I am looking for an alternative to AIC or BIC to choose model parameters. > This is somewhat of a general statistics question, but I ask it in this > forum as I am looking for a R solution. > > Suppose I have one dependent variable, y, and two independent variables, > x1 an x2. > > I can perform three regressions: > reg1: y~x1 > reg2: y~x2 > reg3: y~x1+x2 > > The AIC of reg1 is 2000, reg2 is 1000 and reg3 is 950. One would, > presumably, conclude that one should use both x1 and x2. However, the > R^2's are quite different: R^2 of reg1 is 0.5%, reg2 is 95% and reg3 is > 95.25%. Knowing that, I would actually conclude that x1 adds litte and > should probably not be used. > > There is the overall question of what potentially explains this outcome, > i.e. the reduction in AIC in going from reg2 to reg3 even though R^2 does > not materially improve > with the addition of x1 to reg 2 (to get to reg3). But that is more of a > generic statistics issue and not my question here. > > The question I do have is, is there a package in R which implements a test > and provides some diagnostic information I can use to rule out the use of > x1 in a systematic way as it's addition to the equation adds little in > terms of explaining the variability of y. > > Thanks in advance, > Tolga > > Generally, this communication is for informational purposes only > and it is not intended as an offer or solicitation for the purchase > or sale of any financial instrument or as an official confirmation > of any transaction. In the event you are receiving the offering > materials attached below related to your interest in hedge funds or > private equity, this communication may be intended as an offer or > solicitation for the purchase or sale of such fund(s). All market > prices, data and other information are not warranted as to > completeness or accuracy and are subject to change without notice. > Any comments or statements made herein do not necessarily reflect > those of JPMorgan Chase & Co., its subsidiaries and affiliates. > > This transmission may contain information that is privileged, > confidential, legally privileged, and/or exempt from disclosure > under applicable law. If you are not the intended recipient, you > are hereby notified that any disclosure, copying, distribution, or > use of the information contained herein (including any reliance > thereon) is STRICTLY PROHIBITED. Although this transmission and any > attachments are believed to be free of any virus or other defect > that might affect any computer system into which it is received and > opened, it is the responsibility of the recipient to ensure that it > is virus free and no responsibility is accepted by JPMorgan Chase & > Co., its subsidiaries and affiliates, as applicable, for any loss > or damage arising in any way from its use. If you received this > transmission in error, please immediately contact the sender and > destroy the material in its entirety, whether in electronic or hard > copy format. Thank you. > Please refer to http://www.jpmorgan.com/pages/disclosures for > disclosures relating to UK legal entities. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- John C Frain Trinity College Dublin Dublin 2 Ireland www.tcd.ie/Economics/staff/frainj/home.html mailto:frainj at tcd.ie mailto:frainj at gmail.com
tolga.i.uzuner at jpmorgan.com
2008-Aug-13 19:19 UTC
[R] which alternative tests instead of AIC/BIC for choosing models
By way of partial follow-up to my own question, and on the odd chance anyone else wonders about this issue, some alternatives to this appear to be in the leaps package, which implements the leaps routine (Mallows Cp) and regsubsets. In my case Mallows' Cp does not work either (see below), so I have implemented the following. regr # <- holds a zoo object with the 1st column being the dependent variable r2test<- (result$lm.r2>Rsqr) & (all(unlist(lapply(2:(dim(regr)[2]),function(i) summary(lm(regr[,1]~regr[,i]))$adj.r.squared ))>0.1)) & which.min(leaps(as.matrix(regr[,-1]),regr[,1])$Cp)==dim(regr)[2] leaps on the same problem below ==============================> leaps(as.matrix(regr3[,-1]),regr3[,1],method=c("adjr2"))$which 1 2 1 FALSE TRUE 1 TRUE FALSE 2 TRUE TRUE $label [1] "(Intercept)" "1" "2" $size [1] 2 2 3 $adjr2 [1] 0.950757134 0.001681389 0.954859493> leaps(as.matrix(regr3[,-1]),regr3[,1],method=c("Cp"))$which 1 2 1 FALSE TRUE 1 TRUE FALSE 2 TRUE TRUE $label [1] "(Intercept)" "1" "2" $size [1] 2 2 3 $Cp [1] 38.53367 8490.55327 3.00000>Tolga I Uzuner/JPMCHASE 13/08/2008 17:33 To r-help@r-project.org cc Subject which alternative tests instead of AIC/BIC for choosing models Dear R Users, I am looking for an alternative to AIC or BIC to choose model parameters. This is somewhat of a general statistics question, but I ask it in this forum as I am looking for a R solution. Suppose I have one dependent variable, y, and two independent variables, x1 an x2. I can perform three regressions: reg1: y~x1 reg2: y~x2 reg3: y~x1+x2 The AIC of reg1 is 2000, reg2 is 1000 and reg3 is 950. One would, presumably, conclude that one should use both x1 and x2. However, the R^2's are quite different: R^2 of reg1 is 0.5%, reg2 is 95% and reg3 is 95.25%. Knowing that, I would actually conclude that x1 adds litte and should probably not be used. There is the overall question of what potentially explains this outcome, i.e. the reduction in AIC in going from reg2 to reg3 even though R^2 does not materially improve with the addition of x1 to reg 2 (to get to reg3). But that is more of a generic statistics issue and not my question here. The question I do have is, is there a package in R which implements a test and provides some diagnostic information I can use to rule out the use of x1 in a systematic way as it's addition to the equation adds little in terms of explaining the variability of y. Thanks in advance, Tolga Generally, this communication is for informational purposes only and it is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. In the event you are receiving the offering materials attached below related to your interest in hedge funds or private equity, this communication may be intended as an offer or solicitation for the purchase or sale of such fund(s). All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to UK legal entities. [[alternative HTML version deleted]]