Chris Beeley
2012-Dec-09 21:04 UTC
[R] Some coefficients are doubled when I use the step() function
Hello-
Such a strange problem, can't figure it out at all. Using binomial glm
models, and the step() function, so the call looks like this:
sectionmodel = glm(formula = Target3 ~ S1Q12_NUM.1 + S1Q9_NUM.1 + S1Q5_NUM.1 +
S1Q7_NUM.1 + S1Q8_NUM.1 + S1Q6_NUM.1 + S1Q10_NUM.1 + S1Q12_BURG.1 +
S1Q12_CD.1 + S1Q4.1 + S1Q12_OTHVIOL.1 + S1Q8.1 + S1Q12_GBH.1 +
S1Q11.1 + S1Q7.1 + S1Q12_THEFT.1 + S1Q12_DRIV.1 + S1Q5.1 +
S1Q9.1 + S1Q12_DRUG.1, family = binomial, data = moddata)
But when I run step() on the resulting model, some of the coefficents
are doubled when it comes back, with a "2" at the end, e.g. like this:
mymodel = step(sectionmodel, direction="backward", test="F")
summary(mymodel) returns this:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.58519 0.55675 -8.236 <2e-16 ***
S1Q12_NUM.1 0.18446 0.08576 2.151 0.0315 *
S1Q4.12 0.56893 0.40281 1.412 0.1578
S1Q12_OTHVIOL.11 0.56435 0.38262 1.475 0.1402
S1Q12_GBH.11 0.49199 0.33175 1.483 0.1381
S1Q7.11 -1.27330 1.12897 -1.128 0.2594
S1Q7.12 -1.83927 1.16909 -1.573 0.1157
S1Q5.11 0.91742 1.19489 0.768 0.4426
S1Q5.12 2.16861 1.19864 1.809 0.0704 .
S1Q12_DRUG.11 -0.48400 0.29898 -1.619 0.1055
As you can see S1Q7.1 and S1Q5.1 are duplicated as "S1Q7.11" and
"S1Q7.12" etc.
I've googled and read and re-read the step() and stepAIC()
documentation and I just can't figure out what it could mean. Removing
the test="F" bit also generates the same behaviour.
Any help greatly appreciated.
Chris Beeley
Institute of Mental Health, UK
Ben Bolker
2012-Dec-09 22:00 UTC
[R] Some coefficients are doubled when I use the step() function
Chris Beeley <chris.beeley <at> gmail.com> writes:> Such a strange problem, can't figure it out at all. Using binomial glm > models, and the step() function, so the call looks like this: > > sectionmodel = glm(formula = Target3 ~ S1Q12_NUM.1 + S1Q9_NUM.1 + S1Q5_NUM.1 +[snip]> But when I run step() on the resulting model, some of the coefficents > are doubled when it comes back, with a "2" at the end, e.g. like this: > > mymodel = step(sectionmodel, direction="backward", test="F") > > summary(mymodel) returns this: > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -4.58519 0.55675 -8.236 <2e-16 *** > S1Q12_NUM.1 0.18446 0.08576 2.151 0.0315 * > S1Q4.12 0.56893 0.40281 1.412 0.1578 > S1Q12_OTHVIOL.11 0.56435 0.38262 1.475 0.1402 > S1Q12_GBH.11 0.49199 0.33175 1.483 0.1381 > S1Q7.11 -1.27330 1.12897 -1.128 0.2594 > S1Q7.12 -1.83927 1.16909 -1.573 0.1157 > S1Q5.11 0.91742 1.19489 0.768 0.4426 > S1Q5.12 2.16861 1.19864 1.809 0.0704 . > S1Q12_DRUG.11 -0.48400 0.29898 -1.619 0.1055> As you can see S1Q7.1 and S1Q5.1 are duplicated as "S1Q7.11" and > "S1Q7.12" etc. I've googled and read and re-read the step() and > stepAIC() documentation and I just can't figure out what it could > mean. Removing the test="F" bit also generates the same behaviour. > Any help greatly appreciated. Chris Beeley Institute of Mental > Health, UKMy guess is that S1Q7.1 and S1Q5.1 are (possibly accidentally) categorical variables (factors), and that either the second and third levels of the factors are "1" and "2", or you have set sum-to-zero contrasts somewhere along the line. Note that other variables have numeric values appended to their names, which indicates that they are also being treated as categorical variables, and that their levels are coded numerically ... (e.g. SIQ4.1) My prediction is that this "doubling" is independent of the use of step(), and that you would see these parameters reflected in the summary() of the full model ...