Ronaldo Reis Jr.
2003-Oct-14 09:14 UTC
[R] different results depending of variable position.
Hi, I make an analysis and depending of the order of the variables, the significance change, look. m1 <- glm((infec/ntot)~idade+sexo+peso,family=binomial,weights=ntot)> anova(m1,test="F")Analysis of Deviance Table Model: binomial, link: logit Response: (infec/ntot) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev F Pr(>F) NULL 80 83.234 idade 1 1.302 79 81.932 1.3020 0.2538510 sexo 1 9.137 78 72.796 9.1366 0.0025055 ** peso 1 12.937 77 59.859 12.9373 0.0003221 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1> m1 <- glm((infec/ntot)~sexo+peso+idade,family=binomial,weights=ntot) > anova(m1,test="F")Analysis of Deviance Table Model: binomial, link: logit Response: (infec/ntot) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev F Pr(>F) NULL 80 83.234 sexo 1 8.278 79 74.956 8.2780 0.0040128 ** peso 1 11.171 78 63.785 11.1711 0.0008308 *** idade 1 3.927 77 59.859 3.9268 0.0475237 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1> m1 <- glm((infec/ntot)~peso+idade+sexo,family=binomial,weights=ntot) > anova(m1,test="F")Analysis of Deviance Table Model: binomial, link: logit Response: (infec/ntot) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev F Pr(>F) NULL 80 83.234 peso 1 15.162 79 68.072 15.1623 9.865e-05 *** idade 1 2.773 78 65.299 2.7731 0.09586 . sexo 1 5.440 77 59.859 5.4405 0.01968 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1> m1 <- glm((infec/ntot)~idade+peso+sexo,family=binomial,weights=ntot) > anova(m1,test="F")Analysis of Deviance Table Model: binomial, link: logit Response: (infec/ntot) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev F Pr(>F) NULL 80 83.234 idade 1 1.302 79 81.932 1.3020 0.25385 peso 1 16.633 78 65.299 16.6334 4.534e-05 *** sexo 1 5.440 77 59.859 5.4405 0.01968 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1>Why this? How the best method to select the model (with idade or without idade)? AIC? Thanks Ronaldo -- Entre dois pecados, eu sempre escolho o que ainda n?o cometi --Mae West -- |> // | \\ [***********************************] | ( ? ? ) [Ronaldo Reis J?nior ] |> V [UFV/DBA-Entomologia ] | / \ [36571-000 Vi?osa - MG ] |> /(.''`.)\ [Fone: 31-3899-2532 ] | /(: :' :)\ [chrysopa at insecta.ufv.br ] |>/ (`. `'` ) \[ICQ#: 5692561 | LinuxUser#: 205366 ] | ( `- ) [***********************************] |>> _/ \_Powered by GNU/Debian Woody/Sarge
> From: Ronaldo Reis Jr. [mailto:chrysopa at insecta.ufv.br] > > Hi, > > I make an analysis and depending of the order of the variables, the > significance change, look.[output of glm fits omitted]> Why this?Because, as the output says:> Terms added sequentially (first to last)When the predictors are not orthogonal (i.e., correlation=0), the question "is variable X significant" depends on the model that's being fitted. The significance test that anova() performs the following comparisons, assuming X1 through X4 are the variables: Y ~ 1 vs. Y ~ X1 Y ~ X1 vs. Y ~ X1 + X2 Y ~ X1 + X2 vs. Y ~ X1 + X2 + X3 Y ~ X1 + X2 + X3 vs. Y ~ X1 + X2 + X3 + X4> How the best method to select the model (with idade or > without idade)? AIC?That also depends on what you are looking for in the model. If you are looking for interpretation, probably the answer is not to select models, as that could lead to bias in the coefficients of the selected model. HTH, Andy> Thanks > Ronaldo > -- > > Entre dois pecados, eu sempre escolho o que ainda n?o cometi > > --Mae West > -- > |> // | \\ [***********************************] > | ( ? ? ) [Ronaldo Reis J?nior ] > |> V [UFV/DBA-Entomologia ] > | / \ [36571-000 Vi?osa - MG ] > |> /(.''`.)\ [Fone: 31-3899-2532 ] > | /(: :' :)\ [chrysopa at insecta.ufv.br ] > |>/ (`. `'` ) \[ICQ#: 5692561 | LinuxUser#: 205366 ] > | ( `- ) [***********************************] > |>> _/ \_Powered by GNU/Debian Woody/Sarge > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >
Peter Dalgaard BSA
2003-Oct-14 12:37 UTC
[R] different results depending of variable position.
"Ronaldo Reis Jr." <chrysopa at insecta.ufv.br> writes:> Hi, > > I make an analysis and depending of the order of the variables, the > significance change, look. > > m1 <- glm((infec/ntot)~idade+sexo+peso,family=binomial,weights=ntot) > > anova(m1,test="F") > Analysis of Deviance Table > > Model: binomial, link: logit > > Response: (infec/ntot) > > Terms added sequentially (first to last) >...> > Why this?Because terms are added sequentially. Adding a variable to a model means different things depending on what else is in the model. Most textbooks on regression will explain this.> How the best method to select the model (with idade or without idade)? AIC?I'd suggest looking at a direct comparison: m1 <- glm((infec/ntot)~idade+sexo+peso,family=binomial,weights=ntot) m2 <- glm((infec/ntot)~sexo+peso,family=binomial,weights=ntot) anova(m1,m2,test="F") -p -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907