Hello, I have consciously avoided using step() for model simplification in favour of manually updating the model by removing non-significant terms one at a time. I'm using The R Book by M.J. Crawley as a guide. It comes as no surprise that my analysis does proceed as smoothly as does Crawley's and being a beginner, I'm struggling with what to do next. I have a model: lm(y~A * B * C) where A is a categorical variable with three levels and B and C are continuous covariates. Following Crawley, I execute the model, then use summary.aov() to identify non-significant terms. I begin deleting non-significant interaction terms one at a time (using update). After each update() statement, I use anova(modelOld,modelNew) to contrast the previous model with the updated one. After removing all the interaction terms, I'm left with: lm(y~ A + B + C) again, using summary.aov() I identify A to be non-significant, so I remove it, leaving: lm(y~B + C) both of which are continuous variables Does it still make sense to use summary.aov() or should I use summary.lm() instead? Has the analysis switched from an ANCOVA to a regression? Both give different results so I'm uncertain which summary to accept. Any help would be appreciated! -- View this message in context: http://www.nabble.com/model-simplification-using-Crawley-as-a-guide-tp17769044p17769044.html Sent from the R help mailing list archive at Nabble.com.
Frank E Harrell Jr
2008-Jun-11 11:42 UTC
[R] model simplification using Crawley as a guide
ChCh wrote:> Hello, > > I have consciously avoided using step() for model simplification in favour > of manually updating the model by removing non-significant terms one at a > time. I'm using The R Book by M.J. Crawley as a guide. It comes as no > surprise that my analysis does proceed as smoothly as does Crawley's and > being a beginner, I'm struggling with what to do next. > > I have a model: > > lm(y~A * B * C) > > where A is a categorical variable with three levels and B and C are > continuous covariates. > > Following Crawley, I execute the model, then use summary.aov() to identify > non-significant terms. I begin deleting non-significant interaction terms > one at a time (using update). After each update() statement, I use > anova(modelOld,modelNew) to contrast the previous model with the updated > one. After removing all the interaction terms, I'm left with: > > lm(y~ A + B + C) > > again, using summary.aov() I identify A to be non-significant, so I remove > it, leaving: > > lm(y~B + C) both of which are continuous variables > > Does it still make sense to use summary.aov() or should I use summary.lm() > instead? Has the analysis switched from an ANCOVA to a regression? Both > give different results so I'm uncertain which summary to accept. > > Any help would be appreciated! > >What is the theoretical basis for removing insignificant terms? How will you compensate for this in the final analysis (e.g., how do you unbias your estimate of sigma squared)? -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
ChCh wrote:> Hello, > > I have consciously avoided using step() for model simplification in favour > of manually updating the model by removing non-significant terms one at a > time. I'm using The R Book by M.J. Crawley as a guide. It comes as no > surprise that my analysis does proceed as smoothly as does Crawley's and > being a beginner, I'm struggling with what to do next. > > I have a model: > > lm(y~A * B * C) > > where A is a categorical variable with three levels and B and C are > continuous covariates. > > Following Crawley, I execute the model, then use summary.aov() to identify > non-significant terms. I begin deleting non-significant interaction terms > one at a time (using update). After each update() statement, I use > anova(modelOld,modelNew) to contrast the previous model with the updated > one. After removing all the interaction terms, I'm left with: > > lm(y~ A + B + C) > > again, using summary.aov() I identify A to be non-significant, so I remove > it, leaving: > > lm(y~B + C) both of which are continuous variables > > Does it still make sense to use summary.aov() or should I use summary.lm() > instead? Has the analysis switched from an ANCOVA to a regression? Both > give different results so I'm uncertain which summary to accept. > > Any help would be appreciated! > > >Does he really recommend using summary.aov() on an lm object??? I wouldn't. It _might_ give sensible results, but in general, aov() and its methods rely on balancedness and orthogonality properties of the design, to the extent that I'm inclined to say that if you do not know exactly what is going on, it is probably the wrong thing. I'd use drop1 throughout. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907