kmuller
2012-Mar-03 01:10 UTC
[R] interpreting the output of a glm with an ordered categorical predictor.
Greetings. I'm a Master's student working on an analysis of herbivore damage on plants. I have a tried running a glm with one categorical predictor (aphid abundance) and a binomial response (presence/absence of herbivore damage). My predictor has four categories: high, medium, low, and none. I used the "ordered" function to sort my categories for a glm. ah <- read.csv("http://depot.northwestern.edu/class/2012WI_PBC_435-0_AND_BIOL_SCI_313/muller/herbivoryEdit.csv") ah1<- ah[ah$date=="110810",] ah2<-ah[ah$date=="110904",] aphidOrder <- ordered(ah2$aphidLevelMax,levels=c("none", "low", "med", "high")) ordAph <- glm(chewholebinom~aphidOrder,family=binomial,data=ah2) When I ran the summary for the glm (output pasted below), I could not tell which intercept referred to which factor level. My question is, what do .L, .Q, and .C mean and how can I relate these factors to my original factors (none, low, med, high)? Thank you for your help, Katherine summary(ordAph) Call: glm(formula = chewholebinom ~ aphidOrder, family = binomial, data = ah2) Deviance Residuals: Min 1Q Median 3Q Max -1.6512 -0.9817 0.7687 0.7687 1.5353 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.05567 0.25097 -0.222 0.8245 aphidOrder.L -1.36755 0.49366 -2.770 0.0056 ** aphidOrder.Q 0.36824 0.50195 0.734 0.4632 aphidOrder.C -0.09840 0.51011 -0.193 0.8470 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 137.99 on 99 degrees of freedom Residual deviance: 124.05 on 96 degrees of freedom AIC: 132.05 Number of Fisher Scoring iterations: 4 -- View this message in context: http://r.789695.n4.nabble.com/interpreting-the-output-of-a-glm-with-an-ordered-categorical-predictor-tp4440383p4440383.html Sent from the R help mailing list archive at Nabble.com.
peter dalgaard
2012-Mar-03 08:30 UTC
[R] interpreting the output of a glm with an ordered categorical predictor.
On Mar 3, 2012, at 02:10 , kmuller wrote:> > Greetings. > > I'm a Master's student working on an analysis of herbivore damage on plants. > I have a tried running a glm with one categorical predictor (aphid > abundance) and a binomial response (presence/absence of herbivore damage). > My predictor has four categories: high, medium, low, and none. I used the > "ordered" function to sort my categories for a glm. > > ah <- > read.csv("http://depot.northwestern.edu/class/2012WI_PBC_435-0_AND_BIOL_SCI_313/muller/herbivoryEdit.csv") > ah1<- ah[ah$date=="110810",] > ah2<-ah[ah$date=="110904",] > > aphidOrder <- ordered(ah2$aphidLevelMax,levels=c("none", "low", "med", > "high")) > > ordAph <- glm(chewholebinom~aphidOrder,family=binomial,data=ah2) > > > When I ran the summary for the glm (output pasted below), I could not tell > which intercept referred to which factor level. My question is, what do .L, > .Q, and .C mean and how can I relate these factors to my original factors > (none, low, med, high)?This is an oddity in the modeling code, dating back about two decades and probably impossible to get rid of at this stage. What you are seeing is polynomial contrasts: Linear, Quadratic, Cubic, with a slightly peculiar parametrization designed to be orthogonal in balanced designs. In such designs, you can see whether your effect is described by a low order polynomial by checking whether higher order terms are insignificant. Notice that this assumes that you can assign equidistant scores to the groups (i.e. codes 1:4 in your case). (If the above is gibberish to you, maybe try something like matplot(contr.poly(15)[,1:4], type="b") to see the first few polynomial curves.) The whole thing is somewhat useful; e.g., for "response surface designs" where you systematically vary multiple factors in equidistant steps. However, it is not obvious that it should have been unleashed on all kinds of ordered factors. I conjecture that you don't actually need it, a plain factor() with the levels in the right order should do.> > Thank you for your help, > > Katherine > > summary(ordAph) > > Call: > glm(formula = chewholebinom ~ aphidOrder, family = binomial, > data = ah2) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.6512 -0.9817 0.7687 0.7687 1.5353 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -0.05567 0.25097 -0.222 0.8245 > aphidOrder.L -1.36755 0.49366 -2.770 0.0056 ** > aphidOrder.Q 0.36824 0.50195 0.734 0.4632 > aphidOrder.C -0.09840 0.51011 -0.193 0.8470 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 137.99 on 99 degrees of freedom > Residual deviance: 124.05 on 96 degrees of freedom > AIC: 132.05 > > Number of Fisher Scoring iterations: 4 > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/interpreting-the-output-of-a-glm-with-an-ordered-categorical-predictor-tp4440383p4440383.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com