shubha
2010-Nov-22 16:13 UTC
[R] how do remove those predictor which have p value greater than 0.05 in GLM?
Hi R user, I am a kind of an intermediate user of R. Now I am using GLM model (library MASS, VEGUS). I used a backward stepwise logistic regression, but i got a problem in removing those predictors which are above 0.05. I don't want to include those variables which were above 0.05 in final backward stepwise logetsic regression model. for example: first I run the model, "name<-glm(dep~env1+env2..., family= binomial, data=new)" after that, I did stepwise for name name.step<-step(name, direction="backward") here, I still got those variables which were not significant, for example: secchi was not significant (see below example), but still it was in the model. how can I remove those variables which are not significant in forward/backward stepwise?. another question, when I wrote direction="backward", I got the results same as in the process of "forward". It is really strange. why is it same results for backward and forward. I checked in other two statistical software (Statistica and SYSTAT), they provided a correct results, I think. But, I need to use R for further analysis, therefore I need to fix the problem. I am spending so much time to figure it out, but I could not. could you please give your suggestions. It would be really a great help. please see the example of retaining predictors which have p value is greater that 0.05 after stepwise logistic regression. Thank Shubha Pandit, PhD University of Windsor Windsor, ON, Canada ===> summary(step.glm.int.ag1)Call: glm(formula = ag1less ~ GEARTEMP + DOGEAR + GEARDEPTH + SECCHI + GEARTEMP:SECCHI + DOGEAR:SECCHI + GEARTEMP:DOGEAR + GEARTEMP:GEARDEPTH + DOGEAR:GEARDEPTH, family = binomial, data = training) Deviance Residuals: Min 1Q Median 3Q Max -2.1983 -0.8272 -0.4677 0.8014 2.6502 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.231623 1.846593 1.750 0.080110 . GEARTEMP -0.004408 0.085254 -0.052 0.958761 DOGEAR -0.732805 0.182285 -4.020 5.82e-05 *** GEARDEPTH -0.249237 0.060825 -4.098 4.17e-05 *** SECCHI 0.311875 0.297594 1.048 0.294645 GEARTEMP:SECCHI -0.080664 0.010079 -8.003 1.21e-15 *** DOGEAR:SECCHI 0.066555 0.022181 3.000 0.002695 ** GEARTEMP:DOGEAR 0.030988 0.008907 3.479 0.000503 *** GEARTEMP:GEARDEPTH 0.008856 0.002122 4.173 3.01e-05 *** DOGEAR:GEARDEPTH 0.006680 0.004483 1.490 0.136151 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 3389.5 on 2751 degrees of freedom Residual devia\ n\ ce: 2720.4 on 2742 degrees of freedom AIC: 2740.4uh Number of Fisher Scoring iterations: 6 =========================> glm.int.ag1<-glm(ag1less~GEARTEMP+DOGEAR+GEARDEPTH+SECCHI+SECCHI*GEARTEMP+SECCHI*DOGEAR+SECCHI*GEARDEPTH+GEARTEMP*DOGEAR+GEARTEMP*GEARDEPTH+GEARDEPTH*DOGEAR,data=training, > family=binomial) > summary(glm.int.ag1)Call: glm(formula = ag1less ~ GEARTEMP + DOGEAR + GEARDEPTH + SECCHI + SECCHI * GEARTEMP + SECCHI * DOGEAR + SECCHI * GEARDEPTH + GEARTEMP * DOGEAR + GEARTEMP * GEARDEPTH + GEARDEPTH * DOGEAR, family = binomial, data = training) Deviance Residuals: Min 1Q Median 3Q Max -2.1990 -0.8287 -0.4668 0.8055 2.6673 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.909805 1.928375 1.509 0.131314 GEARTEMP 0.005315 0.087159 0.061 0.951379 DOGEAR -0.721864 0.183708 -3.929 8.52e-05 *** GEARDEPTH -0.235961 0.064828 -3.640 0.000273 *** SECCHI 0.391445 0.326542 1.199 0.230622 GEARTEMP:SECCHI -0.082296 0.010437 -7.885 3.14e-15 *** DOGEAR:SECCHI 0.065572 0.022319 2.938 0.003305 ** GEARDEPTH:SECCHI -0.003176 0.005295 -0.600 0.548675 GEARTEMP:DOGEAR 0.030571 0.008961 3.412 0.000646 *** GEARTEMP:GEARDEPTH 0.008692 0.002159 4.027 5.66e-05 *** DOGEAR:GEARDEPTH 0.006544 0.004495 1.456 0.145484 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 3389.5 on 2751 degrees of freedom Residual deviance: 2720.0 on 2741 degrees of freedom AIC: 2742 Number of Fisher Scoring iterations: 6 -- View this message in context: http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3053921.html Sent from the R help mailing list archive at Nabble.com.
Frank Harrell
2010-Nov-22 21:20 UTC
[R] how do remove those predictor which have p value greater than 0.05 in GLM?
What would make you want to delete a variable because P > 0.05? That will invalidate every aspect of statistical inference for the model. Frank ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3054478.html Sent from the R help mailing list archive at Nabble.com.