Qinghua He
2016-Jul-21 22:04 UTC
[R] Why the order of parameters in a logistic regression affects results significantly?
Using the same data, if I ran fit2 <-glm(formula=AR~Age+LumA+LumB+HER2+Basal+Normal,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2))? I obtained:> exp(coef(fit2))(Intercept) ? ? ? ? Age ? ? ? ?LumA ? ? ? ?LumB ? ? ? ?HER2 ? ? ? Basal ? ? ?Normal??0.24866935 ?1.00433781 ?0.10639937 ?0.31614001 ?0.08220685 20.25180956 ? ? ? ? ?NA?while if I ran fit2 <-glm(formula=AR~Age+LumA+LumB+Basal+Normal+HER2,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2)) I obtained:> exp(coef(fit2))?(Intercept) ? ? ? ? ?Age ? ? ? ? LumA ? ? ? ? LumB ? ? ? ?Basal ? ? ? Normal ? ? ? ? HER2?? 0.02044232 ? 1.00433781 ? 1.29428846 ? 3.84566516 246.35185956 ?12.16443690 ? ? ? ? ? NA?Essentially they're the same model - I just moved HER2 to the last. But the OR changed significantly. Can someone explain? For the latter result, I don't even know how to interpret as all factors have OR>1 (except Intercept), how could that possible? Can I eliminate the effect of intercept? Also, I cannot obtain OR for the last factor due to collinearity. However, I know others obtained OR for all factors for the same dataset. Can someone tell me how to obtain OR for all factors? All factors are categorical variables (i.e., 0 or 1). Thanks! Peter [[alternative HTML version deleted]]
Greg Snow
2016-Jul-22 14:50 UTC
[R] Why the order of parameters in a logistic regression affects results significantly?
Please post in plain text, the message is very hard to read with the reformatting that was done. Did you receive any warnings when you fit your models? The fact that the last coefficient is NA in both outputs suggests that there was some co-linearity in your predictor variables and R chose to drop one of the offending variables from the model (the last one in each case). Depending on the nature of the co-linearity, the interpretation (and therefore the estimates) can change. For example lets say that you have 3 predictors, red, green, and blue that are indicator variables (0/1) and that every subject has a 1 in exactly one of those variables (so they are co-linear with the intercept). If you put the 3 variables into a model with the intercept in the above order, then R will drop the blue variable and the interpretation of the coefficients is that the intercept is the average for blue subjects and the other coefficients are the differences between red/green and blue on average. If you refit the model with the order blue, green, red, then R will drop red from the model and now the interpretation is that the intercept is the mean for red subjects and the others are the differences from red on average, a very different interpretation and therefore different estimates. I expect something along those lines is going on here. On Thu, Jul 21, 2016 at 4:04 PM, Qinghua He via R-help <r-help at r-project.org> wrote:> Using the same data, if I ran > fit2 <-glm(formula=AR~Age+LumA+LumB+HER2+Basal+Normal,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2)) > I obtained: >> exp(coef(fit2))(Intercept) Age LumA LumB HER2 Basal Normal 0.24866935 1.00433781 0.10639937 0.31614001 0.08220685 20.25180956 NA > while if I ran > > fit2 <-glm(formula=AR~Age+LumA+LumB+Basal+Normal+HER2,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2)) > I obtained: >> exp(coef(fit2)) (Intercept) Age LumA LumB Basal Normal HER2 0.02044232 1.00433781 1.29428846 3.84566516 246.35185956 12.16443690 NA > > Essentially they're the same model - I just moved HER2 to the last. But the OR changed significantly. Can someone explain? > For the latter result, I don't even know how to interpret as all factors have OR>1 (except Intercept), how could that possible? Can I eliminate the effect of intercept? > Also, I cannot obtain OR for the last factor due to collinearity. However, I know others obtained OR for all factors for the same dataset. Can someone tell me how to obtain OR for all factors? All factors are categorical variables (i.e., 0 or 1). > Thanks! > Peter > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
Michael Dewey
2016-Jul-22 16:21 UTC
[R] Why the order of parameters in a logistic regression affects results significantly?
Dear Peter Have you tried removing the intercept? Just put -1 at the end of your formula. On 21/07/2016 23:04, Qinghua He via R-help wrote:> Using the same data, if I ran > fit2 <-glm(formula=AR~Age+LumA+LumB+HER2+Basal+Normal,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2)) > I obtained: >> exp(coef(fit2))(Intercept) Age LumA LumB HER2 Basal Normal 0.24866935 1.00433781 0.10639937 0.31614001 0.08220685 20.25180956 NA > while if I ran > > fit2 <-glm(formula=AR~Age+LumA+LumB+Basal+Normal+HER2,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2)) > I obtained: >> exp(coef(fit2)) (Intercept) Age LumA LumB Basal Normal HER2 0.02044232 1.00433781 1.29428846 3.84566516 246.35185956 12.16443690 NA > > Essentially they're the same model - I just moved HER2 to the last. But the OR changed significantly. Can someone explain? > For the latter result, I don't even know how to interpret as all factors have OR>1 (except Intercept), how could that possible? Can I eliminate the effect of intercept? > Also, I cannot obtain OR for the last factor due to collinearity. However, I know others obtained OR for all factors for the same dataset. Can someone tell me how to obtain OR for all factors? All factors are categorical variables (i.e., 0 or 1). > Thanks! > Peter > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html
David Winsemius
2016-Jul-22 17:24 UTC
[R] Why the order of parameters in a logistic regression affects results significantly?
> On Jul 21, 2016, at 3:04 PM, Qinghua He via R-help <r-help at r-project.org> wrote: > > Using the same data, if I ran > fit2 <-glm(formula=AR~Age+LumA+LumB+HER2+Basal+Normal,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2)) > I obtained:exp(coef(fit2))(Intercept) Age LumA LumB HER2 Basal Normal 0.24866935 1.00433781 0.10639937 0.31614001 0.08220685 20.25180956 NA> while if I ran > > fit2 <-glm(formula=AR~Age+LumA+LumB+Basal+Normal+HER2,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2)) > I obtained:exp(coef(fit2)) (Intercept) Age LumA LumB Basal Normal HER2 0.02044232 1.00433781 1.29428846 3.84566516 246.35185956 12.16443690 NA> > Essentially they're the same model - I just moved HER2 to the last. But the OR changed significantly. Can someone explain?You have collinearity and one of your variables will be dropped as redundant. Which one is dropped is determined by the order of the variable names in the model formula.> For the latter result, I don't even know how to interpret as all factors have OR>1 (except Intercept), how could that possible? Can I eliminate the effect of intercept?In the first model (with the defaults of treatment contrasts) the Intercept is actually an estimate for cases with LumA, LumB,Basal,Her2 all at their lowest level and this not coincidentally also precisely defines your Normal variable. They all (excepting Normal) have adverse impact in your study of AR whatever it might be. If these various categories (which I suspect are breast cancer risk predictors) are all distinct with no overlaps, then use this: fit2 <-glm(formula=AR~Age+ Normal+ LumA+LumB+HER2+Basal+ 0,family=binomial,data=RacComp1) The results will probably be the same as your first model except that Intercept's parameter will now be the parameter for Normal.> Also, I cannot obtain OR for the last factor due to collinearity. However, I know others obtained OR for all factors for the same dataset. Can someone tell me how to obtain OR for all factors? All factors are categorical variables (i.e., 0 or 1). > Thanks! > Peter > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA