Chris Beeley
2012-Dec-09 21:04 UTC
[R] Some coefficients are doubled when I use the step() function
Hello- Such a strange problem, can't figure it out at all. Using binomial glm models, and the step() function, so the call looks like this: sectionmodel = glm(formula = Target3 ~ S1Q12_NUM.1 + S1Q9_NUM.1 + S1Q5_NUM.1 + S1Q7_NUM.1 + S1Q8_NUM.1 + S1Q6_NUM.1 + S1Q10_NUM.1 + S1Q12_BURG.1 + S1Q12_CD.1 + S1Q4.1 + S1Q12_OTHVIOL.1 + S1Q8.1 + S1Q12_GBH.1 + S1Q11.1 + S1Q7.1 + S1Q12_THEFT.1 + S1Q12_DRIV.1 + S1Q5.1 + S1Q9.1 + S1Q12_DRUG.1, family = binomial, data = moddata) But when I run step() on the resulting model, some of the coefficents are doubled when it comes back, with a "2" at the end, e.g. like this: mymodel = step(sectionmodel, direction="backward", test="F") summary(mymodel) returns this: Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.58519 0.55675 -8.236 <2e-16 *** S1Q12_NUM.1 0.18446 0.08576 2.151 0.0315 * S1Q4.12 0.56893 0.40281 1.412 0.1578 S1Q12_OTHVIOL.11 0.56435 0.38262 1.475 0.1402 S1Q12_GBH.11 0.49199 0.33175 1.483 0.1381 S1Q7.11 -1.27330 1.12897 -1.128 0.2594 S1Q7.12 -1.83927 1.16909 -1.573 0.1157 S1Q5.11 0.91742 1.19489 0.768 0.4426 S1Q5.12 2.16861 1.19864 1.809 0.0704 . S1Q12_DRUG.11 -0.48400 0.29898 -1.619 0.1055 As you can see S1Q7.1 and S1Q5.1 are duplicated as "S1Q7.11" and "S1Q7.12" etc. I've googled and read and re-read the step() and stepAIC() documentation and I just can't figure out what it could mean. Removing the test="F" bit also generates the same behaviour. Any help greatly appreciated. Chris Beeley Institute of Mental Health, UK
Ben Bolker
2012-Dec-09 22:00 UTC
[R] Some coefficients are doubled when I use the step() function
Chris Beeley <chris.beeley <at> gmail.com> writes:> Such a strange problem, can't figure it out at all. Using binomial glm > models, and the step() function, so the call looks like this: > > sectionmodel = glm(formula = Target3 ~ S1Q12_NUM.1 + S1Q9_NUM.1 + S1Q5_NUM.1 +[snip]> But when I run step() on the resulting model, some of the coefficents > are doubled when it comes back, with a "2" at the end, e.g. like this: > > mymodel = step(sectionmodel, direction="backward", test="F") > > summary(mymodel) returns this: > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -4.58519 0.55675 -8.236 <2e-16 *** > S1Q12_NUM.1 0.18446 0.08576 2.151 0.0315 * > S1Q4.12 0.56893 0.40281 1.412 0.1578 > S1Q12_OTHVIOL.11 0.56435 0.38262 1.475 0.1402 > S1Q12_GBH.11 0.49199 0.33175 1.483 0.1381 > S1Q7.11 -1.27330 1.12897 -1.128 0.2594 > S1Q7.12 -1.83927 1.16909 -1.573 0.1157 > S1Q5.11 0.91742 1.19489 0.768 0.4426 > S1Q5.12 2.16861 1.19864 1.809 0.0704 . > S1Q12_DRUG.11 -0.48400 0.29898 -1.619 0.1055> As you can see S1Q7.1 and S1Q5.1 are duplicated as "S1Q7.11" and > "S1Q7.12" etc. I've googled and read and re-read the step() and > stepAIC() documentation and I just can't figure out what it could > mean. Removing the test="F" bit also generates the same behaviour. > Any help greatly appreciated. Chris Beeley Institute of Mental > Health, UKMy guess is that S1Q7.1 and S1Q5.1 are (possibly accidentally) categorical variables (factors), and that either the second and third levels of the factors are "1" and "2", or you have set sum-to-zero contrasts somewhere along the line. Note that other variables have numeric values appended to their names, which indicates that they are also being treated as categorical variables, and that their levels are coded numerically ... (e.g. SIQ4.1) My prediction is that this "doubling" is independent of the use of step(), and that you would see these parameters reflected in the summary() of the full model ...