Dear R users, I have been having problems getting believable estimates from anova on a model fit from lmer. I get the impression that F is being greatly underestimated, as can be seen by running the example I have given below. First an explanation of what I'm trying to do. I am trying to fit a glmm with binomial errors to some data. The experiment involves 10 shadehouses, divided between 2 light treatments (high, low). Within each shadehouse there are 12 seedlings of each of 2 species (hn & sl). 3 damage treatments (0, 0.1, 0.25 leaf area removal) were applied to the seedlings (at random) so that there are 4 seedlings of each species*damage treatment in each shadehouse. There maybe a shadehouse effect, so I need to include it as a random effect. Light is applied to a shadehouse, so it is outer to shadehouse. The other 2 factors are inner to shadehouse. We want to assess if light, damage and species affect survival of seedlings. To test this I fitted a binomial mixed effects model with lmer (actually with quasibinomial errors). THe summary function suggests a large effect of both light and species (which agrees with graphical analysis). However, anova produces F values close to 0 and p values close to 1 (see example below). Is this a bug, or am I doing something fundamentally wrong? If anova doesn't work with lmer is there a way to perform hypothesis tests on fixed effects in an lmer model? I was going to just delete terms and then do liklihood ratio tests, but according to Pinheiro & Bates (p. 87) that's very untrustworthy. Any suggestions? I'm using R 2.1.1 on windows XP and lme4 0.98-1 Any help will be much appreciated. many thanks Robert ############################### The data are somewhat like this #setting up the dataframe bm.surv<-data.frame( house=rep(1:10, each=6), light=rep(c("h", "l"), each=6, 5), species=rep(c("sl", "hn"), each=3, 10), damage=rep(c(0,.1,.25), 20) ) bm.surv$survival<-ifelse(bm.surv$light=="h", rbinom(60, 4, .9), rbinom(60, 4, .6)) # difference in probablility should ensure a light effect bm.surv$death<-4-bm.surv$survival # fitting the model m1<-lmer(cbind(survival, death)~light+species+damage+(1|house), data=bm.surv, family="quasibinomial") summary(m1) # suggests that light is very significant Generalized linear mixed model fit using PQL Formula: cbind(survival, death) ~ light + species + damage + (1 | table) Data: bm.surv Family: quasibinomial(logit link) AIC BIC logLik deviance 227.0558 239.6218 -107.5279 215.0558 Random effects: Groups Name Variance Std.Dev. table (Intercept) 1.8158e-09 4.2613e-05 Residual 3.6317e+00 1.9057e+00 # of obs: 60, groups: table, 10 Fixed effects: Estimate Std. Error DF t value Pr(>|t|) (Intercept) 2.35140 0.36832 56 6.3841 3.581e-08 *** lightl -1.71517 0.33281 56 -5.1535 3.447e-06 *** speciessl -0.57418 0.30085 56 -1.9085 0.06145 . damage 1.49963 1.46596 56 1.0230 0.31072 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Correlation of Fixed Effects: (Intr) lightl spcssl lightl -0.665 speciessl -0.494 0.070 damage -0.407 -0.038 -0.017 anova(m1) # very low F value for light, corresponding to p values approaching 1 Analysis of Variance Table Df Sum Sq Mean Sq Denom F value Pr(>F) light 1 0.014 0.014 56.000 0.0018 0.9661 species 1 0.002 0.002 56.000 0.0002 0.9887 damage 1 0.011 0.011 56.000 0.0014 0.9704 -- Robert Bagchi Animal & Plant Science Alfred Denny Building University of Sheffield Western Bank Sheffield S10 2TN UK t: +44 (0)114 2220062 e: r.bagchi at sheffield.ac.uk bagchi.r at gmail.com http://www.shef.ac.uk/aps/apsrtp/bagchi-r
I agree: Something looks strange to me in this example also; I have therefore copied Douglas Bates and Deepayan Sarkar. You've provided a nice simulation. If either of them have time to look at this, I think they could tell us what is happening here. If you need an answer to your particular problem, you could run that simulation 1000 or 1,000 times. That would tell you whether to believe the summary or the anova, or neither. If you want to understand the algorithm, you could walk through the code. However, "lmer" is a generic, and I don't have time now to figure out how to find the source. A response from Brian Ripley to a question from me a couple of days ago provides a nice summary of how to do that, but I don't have time to check that now. Sorry I couldn't help more. spencer graves Robert Bagchi wrote:> Dear R users, > > I have been having problems getting believable estimates from anova on a > model fit from lmer. I get the impression that F is being greatly > underestimated, as can be seen by running the example I have given below. > > First an explanation of what I'm trying to do. I am trying to fit a glmm > with binomial errors to some data. The experiment involves 10 > shadehouses, divided between 2 light treatments (high, low). Within each > shadehouse there are 12 seedlings of each of 2 species (hn & sl). 3 > damage treatments (0, 0.1, 0.25 leaf area removal) were applied to the > seedlings (at random) so that there are 4 seedlings of each > species*damage treatment in each shadehouse. There maybe a shadehouse > effect, so I need to include it as a random effect. Light is applied to > a shadehouse, so it is outer to shadehouse. The other 2 factors are > inner to shadehouse. > > We want to assess if light, damage and species affect survival of > seedlings. To test this I fitted a binomial mixed effects model with > lmer (actually with quasibinomial errors). THe summary function suggests > a large effect of both light and species (which agrees with graphical > analysis). However, anova produces F values close to 0 and p values > close to 1 (see example below). > > Is this a bug, or am I doing something fundamentally wrong? If anova > doesn't work with lmer is there a way to perform hypothesis tests on > fixed effects in an lmer model? I was going to just delete terms and > then do liklihood ratio tests, but according to Pinheiro & Bates (p. 87) > that's very untrustworthy. Any suggestions? > > I'm using R 2.1.1 on windows XP and lme4 0.98-1 > > Any help will be much appreciated. > > many thanks > Robert > > ############################### > The data are somewhat like this > > #setting up the dataframe > > bm.surv<-data.frame( > house=rep(1:10, each=6), > light=rep(c("h", "l"), each=6, 5), > species=rep(c("sl", "hn"), each=3, 10), > damage=rep(c(0,.1,.25), 20) > ) > > bm.surv$survival<-ifelse(bm.surv$light=="h", rbinom(60, 4, .9), > rbinom(60, 4, .6)) # difference in probablility should ensure a > light effect > bm.surv$death<-4-bm.surv$survival > > # fitting the model > m1<-lmer(cbind(survival, death)~light+species+damage+(1|house), > data=bm.surv, family="quasibinomial") > > summary(m1) # suggests that light is very significant > Generalized linear mixed model fit using PQL > Formula: cbind(survival, death) ~ light + species + damage + (1 | table) > Data: bm.surv > Family: quasibinomial(logit link) > AIC BIC logLik deviance > 227.0558 239.6218 -107.5279 215.0558 > Random effects: > Groups Name Variance Std.Dev. > table (Intercept) 1.8158e-09 4.2613e-05 > Residual 3.6317e+00 1.9057e+00 > # of obs: 60, groups: table, 10 > > Fixed effects: > Estimate Std. Error DF t value Pr(>|t|) > (Intercept) 2.35140 0.36832 56 6.3841 3.581e-08 *** > lightl -1.71517 0.33281 56 -5.1535 3.447e-06 *** > speciessl -0.57418 0.30085 56 -1.9085 0.06145 . > damage 1.49963 1.46596 56 1.0230 0.31072 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Correlation of Fixed Effects: > (Intr) lightl spcssl > lightl -0.665 > speciessl -0.494 0.070 > damage -0.407 -0.038 -0.017 > > > anova(m1) # very low F value for light, corresponding to p > values approaching 1 > > Analysis of Variance Table > Df Sum Sq Mean Sq Denom F value Pr(>F) > light 1 0.014 0.014 56.000 0.0018 0.9661 > species 1 0.002 0.002 56.000 0.0002 0.9887 > damage 1 0.011 0.011 56.000 0.0014 0.9704 > >-- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA spencer.graves at pdf.com www.pdf.com <http://www.pdf.com> Tel: 408-938-4420 Fax: 408-280-7915
Hi Patrick thanks for your advice. I have now tried glmmPQL, and it worked fine - I'm getting consistent results between plots and models fitted by glmmPQL. Plus it allows predict() and resid() which is another advantage over lmer at present. quick question though: why does one need to use PQL for binomial models? Is there a good reference for this? A few of my colleagues have also had similar problems, so I'm copying this message on to R-help as it might be useful there. Many thanks Robert Patrick A. Jansen wrote:> > Hi dr Bacghi, > > I ran into exactly the same problem with lmer models that had an > rbind() response variable, as you posted to the R-list. > > The sums of squares produced by anova() seem wrong. They are almost > identical to the mean squares, and hence F-values approach 1. > > Since you need PQL for binomial models anyway, you might want to use > GlmmPQL instead. Seems to work fine with anova(). > > Best regards, > Patrick Jansen > > > *dr Patrick A. Jansen* > University of Groningen > Community and Conservation Ecology group > E P.A.Jansen at .rug.nl > W _www.rug.nl/fwn/onderzoek/programmas/biologie/cocon_ > <http://www.rug.nl/fwn/onderzoek/programmas/biologie/cocon> > > c/o: > Instituto Smithsonian de Investigaciones Tropicales > Att. Patrick A. Jansen Gamboa > Apartado 0843-03092, Balboa, Ancn, Panam, Repblica de Panam > or: > Smithsonian Tropical Research Insititute > Att: Patrick A. Jansen Gamboa > Unit 0948, APO AA, 34002-0948, U.S.A. > > T +507-212-8904 (office) / +507-6516-2008 (cell) > >
On Wed, 28 Sep 2005, Robert Bagchi wrote:>>Hi Patrick >> >>thanks for your advice. I have now tried glmmPQL, and it worked fine - >>I'm getting consistent results between plots and models fitted by >>glmmPQL. Plus it allows predict() and resid() which is another advantage >>over lmer at present. >> >>quick question though: why does one need to use PQL for binomial models? >>Is there a good reference for this? >You don't have to use PQL for binomial models, but you can't use least-squares. PQL is an approximate solution. Laplace and Adaptive Gaussian Quadrature options in lmer are better approximations. So lmer would likely become the better option as it progresses in its development (though the current issues you've found with the F ratios certainly sound like maybe lmer isn't better for you in its current incarnation). alan -- Alan B. Cobo-Lewis, Ph.D. (207) 581-3840 tel Department of Psychology (207) 581-6128 fax University of Maine Orono, ME 04469-5742 alanc at maine.edu http://www.umaine.edu/visualperception