thr3ads.net - R help - [R] which alternative tests instead of AIC/BIC for choosing models [Aug 2008]

If this information is useful, please help other people find it:
Share via:

tolga.i.uzuner at jpmorgan.com

2008-Aug-13 16:33 UTC

[R] which alternative tests instead of AIC/BIC for choosing models

Dear R Users,

I am looking for an alternative to AIC or BIC to choose model parameters. 
This is somewhat of a general statistics question, but I ask it in this 
forum as I am looking for a R solution.

Suppose I have one dependent variable, y, and two independent variables, 
x1 an x2. 

I can perform three regressions: 
reg1: y~x1 
reg2: y~x2 
reg3: y~x1+x2 

The AIC of reg1 is 2000, reg2 is 1000 and reg3 is 950. One would, 
presumably, conclude that one should use both x1 and x2.  However, the 
R^2's are quite different: R^2 of reg1 is 0.5%, reg2 is 95% and reg3 is 
95.25%. Knowing that, I would actually conclude that x1 adds litte and 
should probably not be used.

There is the overall question of what potentially explains this outcome, 
i.e. the reduction in AIC in going from reg2 to reg3 even though R^2 does 
not materially improve 
with the addition of x1 to reg 2 (to get to reg3). But that is more of a 
generic statistics issue and not my question here.

The question I do have is, is there a package in R which implements a test 
and provides some diagnostic information I can use to rule out the use of 
x1 in a systematic way as it's addition to the equation adds little in 
terms of explaining the variability of y.

Thanks in advance,
Tolga

Generally, this communication is for informational purposes only
and it is not intended as an offer or solicitation for the purchase
or sale of any financial instrument or as an official confirmation
of any transaction. In the event you are receiving the offering
materials attached below related to your interest in hedge funds or
private equity, this communication may be intended as an offer or
solicitation for the purchase or sale of such fund(s).  All market
prices, data and other information are not warranted as to
completeness or accuracy and are subject to change without notice.
Any comments or statements made herein do not necessarily reflect
those of JPMorgan Chase & Co., its subsidiaries and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.
	[[alternative HTML version deleted]]

John C Frain

2008-Aug-13 17:51 UTC

head link

[R] which alternative tests instead of AIC/BIC for choosing models

My initial idea would be to forget about AIC and BIC, ask the question
what would one expect to get in the regression and then regress y on
x1 and x2 and use a simple t-test to determine what should be
included.  Remember that omitted variables will bias your coefficients
but if you include redundant variables your results will remain
consistent.  I presume that you do not have any problems with
non-stationary variables.

Best Regards

John

2008/8/13  <tolga.i.uzuner at jpmorgan.com>:> Dear R Users,
>
> I am looking for an alternative to AIC or BIC to choose model parameters.
> This is somewhat of a general statistics question, but I ask it in this
> forum as I am looking for a R solution.
>
> Suppose I have one dependent variable, y, and two independent variables,
> x1 an x2.
>
> I can perform three regressions:
> reg1: y~x1
> reg2: y~x2
> reg3: y~x1+x2
>
> The AIC of reg1 is 2000, reg2 is 1000 and reg3 is 950. One would,
> presumably, conclude that one should use both x1 and x2.  However, the
> R^2's are quite different: R^2 of reg1 is 0.5%, reg2 is 95% and reg3 is
> 95.25%. Knowing that, I would actually conclude that x1 adds litte and
> should probably not be used.
>
> There is the overall question of what potentially explains this outcome,
> i.e. the reduction in AIC in going from reg2 to reg3 even though R^2 does
> not materially improve
> with the addition of x1 to reg 2 (to get to reg3). But that is more of a
> generic statistics issue and not my question here.
>
> The question I do have is, is there a package in R which implements a test
> and provides some diagnostic information I can use to rule out the use of
> x1 in a systematic way as it's addition to the equation adds little in
> terms of explaining the variability of y.
>
> Thanks in advance,
> Tolga
>
> Generally, this communication is for informational purposes only
> and it is not intended as an offer or solicitation for the purchase
> or sale of any financial instrument or as an official confirmation
> of any transaction. In the event you are receiving the offering
> materials attached below related to your interest in hedge funds or
> private equity, this communication may be intended as an offer or
> solicitation for the purchase or sale of such fund(s).  All market
> prices, data and other information are not warranted as to
> completeness or accuracy and are subject to change without notice.
> Any comments or statements made herein do not necessarily reflect
> those of JPMorgan Chase & Co., its subsidiaries and affiliates.
>
> This transmission may contain information that is privileged,
> confidential, legally privileged, and/or exempt from disclosure
> under applicable law. If you are not the intended recipient, you
> are hereby notified that any disclosure, copying, distribution, or
> use of the information contained herein (including any reliance
> thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect
> that might affect any computer system into which it is received and
> opened, it is the responsibility of the recipient to ensure that it
> is virus free and no responsibility is accepted by JPMorgan Chase &
> Co., its subsidiaries and affiliates, as applicable, for any loss
> or damage arising in any way from its use. If you received this
> transmission in error, please immediately contact the sender and
> destroy the material in its entirety, whether in electronic or hard
> copy format. Thank you.
> Please refer to http://www.jpmorgan.com/pages/disclosures for
> disclosures relating to UK legal entities.
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
John C Frain
Trinity College Dublin
Dublin 2
Ireland
www.tcd.ie/Economics/staff/frainj/home.html
mailto:frainj at tcd.ie
mailto:frainj at gmail.com

tolga.i.uzuner at jpmorgan.com

2008-Aug-13 19:19 UTC

head link

[R] which alternative tests instead of AIC/BIC for choosing models

By way of partial follow-up to my own question, and on the odd chance 
anyone else wonders about this issue, some alternatives to this appear to 
be in the leaps package, which implements the leaps routine (Mallows Cp) 
and regsubsets. In my case Mallows' Cp does not work either (see below), 
so I have implemented the following.

regr # <- holds a zoo object with the 1st column being the dependent 
variable

r2test<- (result$lm.r2>Rsqr) & 
        (all(unlist(lapply(2:(dim(regr)[2]),function(i) 
summary(lm(regr[,1]~regr[,i]))$adj.r.squared ))>0.1)) &
        which.min(leaps(as.matrix(regr[,-1]),regr[,1])$Cp)==dim(regr)[2]

leaps on the same problem below
==============================> leaps(as.matrix(regr3[,-1]),regr3[,1],method=c("adjr2"))$which
      1     2
1 FALSE  TRUE
1  TRUE FALSE
2  TRUE  TRUE

$label
[1] "(Intercept)" "1"           "2" 

$size
[1] 2 2 3

$adjr2
[1] 0.950757134 0.001681389 0.954859493
> leaps(as.matrix(regr3[,-1]),regr3[,1],method=c("Cp"))$which
      1     2
1 FALSE  TRUE
1  TRUE FALSE
2  TRUE  TRUE

$label
[1] "(Intercept)" "1"           "2" 

$size
[1] 2 2 3

$Cp
[1]   38.53367 8490.55327    3.00000
> 


Tolga I Uzuner/JPMCHASE 
13/08/2008 17:33

To
r-help@r-project.org
cc

Subject
which alternative tests instead of AIC/BIC for choosing models





Dear R Users,

I am looking for an alternative to AIC or BIC to choose model parameters. 
This is somewhat of a general statistics question, but I ask it in this 
forum as I am looking for a R solution.

Suppose I have one dependent variable, y, and two independent variables, 
x1 an x2. 

I can perform three regressions: 
reg1: y~x1 
reg2: y~x2 
reg3: y~x1+x2 

The AIC of reg1 is 2000, reg2 is 1000 and reg3 is 950. One would, 
presumably, conclude that one should use both x1 and x2.  However, the 
R^2's are quite different: R^2 of reg1 is 0.5%, reg2 is 95% and reg3 is 
95.25%. Knowing that, I would actually conclude that x1 adds litte and 
should probably not be used.

There is the overall question of what potentially explains this outcome, 
i.e. the reduction in AIC in going from reg2 to reg3 even though R^2 does 
not materially improve 
with the addition of x1 to reg 2 (to get to reg3). But that is more of a 
generic statistics issue and not my question here.

The question I do have is, is there a package in R which implements a test 
and provides some diagnostic information I can use to rule out the use of 
x1 in a systematic way as it's addition to the equation adds little in 
terms of explaining the variability of y.

Thanks in advance,
Tolga


Generally, this communication is for informational purposes only
and it is not intended as an offer or solicitation for the purchase
or sale of any financial instrument or as an official confirmation
of any transaction. In the event you are receiving the offering
materials attached below related to your interest in hedge funds or
private equity, this communication may be intended as an offer or
solicitation for the purchase or sale of such fund(s).  All market
prices, data and other information are not warranted as to
completeness or accuracy and are subject to change without notice.
Any comments or statements made herein do not necessarily reflect
those of JPMorgan Chase & Co., its subsidiaries and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more reasonably related threads

R help - Aug 2008 - which alternative tests instead of AIC/BIC for choosing models

[R] which alternative tests instead of AIC/BIC for choosing models

[R] which alternative tests instead of AIC/BIC for choosing models

[R] which alternative tests instead of AIC/BIC for choosing models

Possibly Parallel Threads