Dear List Members,
I have encountered two problems when using the step function to
select models. To better illustrate the problems, attached is an
R image which includes the objects needed to run the code attached.
lm.data.frame have factor variables with 3 levels.
The following run shows the first problem. AICs (* and **) are different.
I noticed that the Df for rs13482096:rs13483699 is 4, while I think
Df should be 6, 2 from rs13483699 and 4 from interactions. When I ran
add1 directly, I got Df=6 and AIC 848.75.
> step2.bic.out <- step(step.bic.out, scope=list(lower=scope.lower2,
upper=scope.upper2),
+ direction="both",
k=log(length(step.bic.out$y)),
trace=1)
Start: AIC=841.18
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221
Df Deviance AIC
+ rs13482096:rs13483699 4 216.63 840.63 (*)
<none> 233.82 841.18
- rs8254221 2 244.08 842.90
- rs13482096 2 245.20 844.31
......
Step: AIC=848.75 (**)
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 +
rs13482096:rs13483699
> add1(step.bic.out, scope="rs13482096:rs13483699",
k=log(length(step.bic.out$y)))
Single term additions
Model:
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221
Df Deviance AIC
<none> 233.82 841.18
rs13482096:rs13483699 6 214.28 848.75 (**)
When I used add1 to handle terms to be added together and separately,
I got different results. The example is shown below. This might explain
the discrepancy shown above.> add1(step.bic.out, scope=int.terms[11:12], k=log(length(step.bic.out$y)))
Single term additions
Model:
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221
Df Deviance AIC
<none> 233.82 841.18
rs13479085:rs13475933 6 224.95 863.66
rs13480057:rs13475933 4 226.72 854.62 (***)> add1(step.bic.out, scope=int.terms[11], k=log(length(step.bic.out$y)))
Single term additions
Model:
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221
Df Deviance AIC
<none> 233.82 841.18
rs13479085:rs13475933 6 224.95 863.66> add1(step.bic.out, scope=int.terms[12], k=log(length(step.bic.out$y)))
Single term additions
Model:
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221
Df Deviance AIC
<none> 233.82 841.18
rs13480057:rs13475933 6 215.95 851.12 (***)
Another problem is that the final model seems to be the first
model that satisfies (bAIC >= AIC + 1e-07) if steps haven't used up,
rather than the one before that. Please see below.
> formula(step2.bic.out)
pheno.dat ~ rs13479085 + rs13480057 + rs13482096 + rs8254221 +
rs13482096:rs13483699
> step2.bic.out$anova
Step Df Deviance Resid. Df Resid. Dev AIC
1 NA NA 298 233.8226 841.1784
Any insights would be greatly appreciated. Thanks much !
Ping Wang