Florian Moser
2012-Sep-18 13:27 UTC
[R] Lowest AIC after stepAIC can be lowered by manual reduction of variables
Hello I am not really a statistic person, so it's possible i did something completely wrong... if this is the case: sorry... I try to get the best GLM model (with the lowest AIC) for my dataset. Therefore I run a stepAIC (in the "MASS" package) for my GLM allowing only two-variable-interactions. For the output (summary) I got a model with 7 (of 8) variabels and 5 interactions and AIC=40.008 BUT: When I take this model and reduce stepwise further variables manually (starting with the one with the highest p-values and first reducing all interactions of a variable before i reduce the variable itself) until i can't reduce more variables since all (or its interaction) have a p-value < 0.1, I get a model with 4 variables and 2 interactions and an AIC of 33.879 So my questions: Why didn't the stepAIC give me the model with AIC=33.879? And which model should I think of as the best? For my calculations I used these formulae: gm1<-glm(cpi~time+tank+...,data=d1) gm2<-stepAIC(gm1) summary(gm2) #to get the "best" model -> AIC=40.008 #afterwards I reduced manually using the formula: summary(glm(cpi~time+tank+...,data=d1)) giving me a model with AIC=33.879 Hope you understand what I did, and that you can help me. Thanks Florian [[alternative HTML version deleted]]
Greg Snow
2012-Sep-18 15:47 UTC
[R] Lowest AIC after stepAIC can be lowered by manual reduction of variables
Do you understand what you did (not the individual steps, but what the overall process does)? You simplified your model using things other than the AIC, if you go back and look at the AIC at each step that you did you will probably find that some of the intermediate steps actually had a slightly higher AIC value and that is why the step function stopped where it did. It is common that stepwise methods will give different final models depending on where they are started and what options are used and that even then they are not guaranteed to give the "best" model, even when you can determine what "best" means. Stepwise methods are often a complicated equivalent to throwing darts blindfolded (the final model is more due to random chance than anything else). What question are you trying to answer? What model makes the most sense scientifically? On Tue, Sep 18, 2012 at 7:27 AM, Florian Moser <floserx2 at yahoo.de> wrote:> Hello > I am not really a statistic person, so it's possible i did something completely wrong... if this is the case: sorry... > I try to get the best GLM model (with the lowest AIC) for my dataset. > Therefore I run a stepAIC (in the "MASS" package) for my GLM allowing only two-variable-interactions. > For the output (summary) I got a model with 7 (of 8) variabels and 5 interactions and AIC=40.008 > BUT: When I take this model and reduce stepwise further variables manually (starting with the one with the highest p-values and first reducing all interactions of a variable before i reduce the variable itself) until i can't reduce more variables since all (or its interaction) have a p-value < 0.1, I get a model with 4 variables and 2 interactions and an AIC of 33.879 > So my questions: Why didn't the stepAIC give me the model with AIC=33.879? > And which model should I think of as the best? > > For my calculations I used these formulae: > gm1<-glm(cpi~time+tank+...,data=d1) > gm2<-stepAIC(gm1) > summary(gm2) > #to get the "best" model -> AIC=40.008 > #afterwards I reduced manually using the formula: > summary(glm(cpi~time+tank+...,data=d1)) > giving me a model with AIC=33.879 > > Hope you understand what I did, and that you can help me. > Thanks > Florian > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com