thr3ads.net - R help - [R] Calculation of BIC done by leaps-package [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Jan Henckens

2010-Dec-26 14:47 UTC

[R] Calculation of BIC done by leaps-package

Hi Folks,

I've got a question concerning the calculation of the Schwarz-Criterion 
(BIC) done by summary.regsubsets() of the leaps-package:

Using regsubsets() to perform subset-selection I receive an regsubsets 
object that can be summarized by summary.regsubsets(). After this 
operation the resulting summary contains a vector of BIC-values 
representing models of size i=1,...,K.

My problem is that I can't reproduce the calculation of these BIC 
values. I already tried to use extractAIC(...,k=log(n)), 
AIC(...,k=log(n)) and manual calculation using the RSS-vector but none 
matches the calculation done by the summary-function. I already checked 
for constants that could be the reason for the differences but i found 
out, that the values vary apart of adding a constant term.


The source code of the leaps-package states the package calculates the 
BIC this way:

bicvec<-c(bicvec,(n1+ll$intercept)*log(vr)+i*log(n1+ll$intercept))

with:

## number of observations - Intercept:
n1<-ll$nn-ll$intercept
## fraction of sum of squared residulas model i
## and sum of squared residuals null model, I
## just can't understand why the vector ll$ress
## is subscripted double
vr<-ll$ress[i,j]/ll$nullrss
## maximum number of variables
i

^^ This seems to match the calculation done by extractAIC but it doesn't!

Maybe anyone can tell me about the reason of the variation of the 
BIC-values?

Best regards,
Jan Henckens



### Minimal Example:
require(leaps)
bridge <- 
read.table("http://www.stat.tamu.edu/~sheather/book/docs/datasets/bridge.txt",
header=TRUE)
fmla.full <- formula(Time ~ .)
(lm.model <- summary(regsubsets(fmla.full,data=bridge,weights=NULL, 
intercept=TRUE, method="forward")))
lm.model$bic
### The first two models constructed via lm():
extractAIC(lm(Time~Dwgs,data=bridge),k=log(nrow(bridge)))
extractAIC(lm(Time~Dwgs+Case,data=bridge),k=log(nrow(bridge)))

or see

http://www.henckens.de/min_example.R



-- 
jan.henckens | j?llenbecker str. 58 | 33613 bielefeld | germany
tel 0521-5251970

Peter Ehlers

2010-Dec-27 18:22 UTC

head link

[R] Calculation of BIC done by leaps-package

On 2010-12-26 06:47, Jan Henckens wrote:> Hi Folks,
>
> I've got a question concerning the calculation of the Schwarz-Criterion
> (BIC) done by summary.regsubsets() of the leaps-package:
>
> Using regsubsets() to perform subset-selection I receive an regsubsets
> object that can be summarized by summary.regsubsets(). After this
> operation the resulting summary contains a vector of BIC-values
> representing models of size i=1,...,K.
>
> My problem is that I can't reproduce the calculation of these BIC
> values. I already tried to use extractAIC(...,k=log(n)),
> AIC(...,k=log(n)) and manual calculation using the RSS-vector but none
> matches the calculation done by the summary-function. I already checked
> for constants that could be the reason for the differences but i found
> out, that the values vary apart of adding a constant term.
>
>
> The source code of the leaps-package states the package calculates the
> BIC this way:
>
> bicvec<-c(bicvec,(n1+ll$intercept)*log(vr)+i*log(n1+ll$intercept))
>
> with:
>
> ## number of observations - Intercept:
> n1<-ll$nn-ll$intercept
> ## fraction of sum of squared residulas model i
> ## and sum of squared residuals null model, I
> ## just can't understand why the vector ll$ress
> ## is subscripted double
ll$ress is a one-column matrix
> vr<-ll$ress[i,j]/ll$nullrss
> ## maximum number of variables
> i
>
> ^^ This seems to match the calculation done by extractAIC but it
doesn't!
>
> Maybe anyone can tell me about the reason of the variation of the
> BIC-values?
>
> Best regards,
> Jan Henckens
>
>
>
> ### Minimal Example:
> require(leaps)
> bridge<-
>
read.table("http://www.stat.tamu.edu/~sheather/book/docs/datasets/bridge.txt",
> header=TRUE)
> fmla.full<- formula(Time ~ .)
> (lm.model<- summary(regsubsets(fmla.full,data=bridge,weights=NULL,
> intercept=TRUE, method="forward")))
> lm.model$bic
> ### The first two models constructed via lm():
> extractAIC(lm(Time~Dwgs,data=bridge),k=log(nrow(bridge)))
> extractAIC(lm(Time~Dwgs+Case,data=bridge),k=log(nrow(bridge)))
>
> or see
>
> http://www.henckens.de/min_example.R
>
Does this help:

bic1 <- extractAIC(lm(Time~Dwgs,data=bridge),k=log(nrow(bridge)))
bic2 <- extractAIC(lm(Time~Dwgs+Spans,data=bridge),k=log(nrow(bridge)))

## NOTE: predictors are 'Dwgs' and 'Spans', not 'Dwgs'
and 'Case'.

all.equal(bic2[2] - bic1[2], diff(lm.model$bic)[1])
#[1] TRUE

## or
bic0.1 <- lm.model$bic[1]
bic0.2 <- lm.model$bic[2]
bic1 - bic0.1
#[1] 410.4472
bic2 - bic0.2
#[1] 410.4472

## So, the values do differ from extractAIC by a constant.

BTW: You probably want to omit 'Case'; I doubt that it's an
intended predictor.

Peter Ehlers

Possibly Parallel Threads

Search for more reasonably related threads

R help - Dec 2010 - Calculation of BIC done by leaps-package

[R] Calculation of BIC done by leaps-package

[R] Calculation of BIC done by leaps-package

Possibly Parallel Threads