Martin Henry H. Stevens
2006-Jun-14 12:38 UTC
[R] lmer binomial model overestimating data?
Hi folks, Warning: I don't know if the result I am getting makes sense, so this may be a statistics question. The fitted values from my binomial lmer mixed model seem to consistently overestimate the cell means, and I don't know why. I assume I am doing something stupid. Below I include code, and a binary image of the data is available at this link: http://www.cas.muohio.edu/~stevenmh/tf.RdataBin This was done with `Matrix' version 0.995-10 and `lme4' version 0.995-2. and R v. 2.3.1 on a Mac, OS 10.4.6. The binomial model below ("mod") was reduced from a more complex one by first using AIC, BIC and LRT for "random" effects, and then relying on Helmert contrasts and AIC, BIC, and LRT to simplify fixed effects. Maybe this was wrong? > load("tf.RdataBin") > library(lme4) > options(contrasts=c("contr.helmert", "contr.poly")) > mod <- lmer(tfb ~ reg+nutrient+amd +reg:nutrient+ + (1|rack) + (1|popu) + (1|gen), data=dat.tb2, family=binomial, method="Laplace") > summary(mod) Generalized linear mixed model fit using Laplace Formula: tfb ~ reg + nutrient + amd + reg:nutrient + (1 | rack) + (1 | popu) + (1 | gen) Data: dat.tb2 Family: binomial(logit link) AIC BIC logLik deviance 402.53 446.64 -191.26 382.53 Random effects: Groups Name Variance Std.Dev. gen (Intercept) 0.385 0.621 popu (Intercept) 0.548 0.741 rack (Intercept) 0.401 0.633 number of obs: 609, groups: gen, 24; popu, 9; rack, 2 Estimated scale (compare to 1) 0.80656 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.391 0.574 4.17 3.1e-05 reg1 0.842 0.452 1.86 0.06252 reg2 0.800 0.241 3.32 0.00091 nutrient1 0.788 0.197 4.00 6.3e-05 amd1 -0.540 0.139 -3.88 0.00010 reg1:nutrient1 0.500 0.227 2.21 0.02734 reg2:nutrient1 -0.176 0.146 -1.21 0.22794 Correlation of Fixed Effects: (Intr) reg1 reg2 ntrnt1 amd1 rg1:n1 reg1 0.169 reg2 -0.066 -0.191 nutrient1 0.178 0.231 -0.034 amd1 -0.074 -0.044 -0.052 -0.078 reg1:ntrnt1 0.157 0.307 -0.180 0.562 -0.002 reg2:ntrnt1 -0.028 -0.154 0.236 0.141 0.033 -0.378 > X <- mod @ X > fitted <- X %*% fixef(mod) > > unlogitH <- function(x) {( 1 + exp(-x) )^-1} > (result <- data.frame(Raw.Data=with(dat.tb2, + tapply(tfb, list(reg:nutrient:amd), + mean ) ), + Fitted.Estimates=with(dat.tb2, + tapply(fitted, list(reg:nutrient:amd), + function(x) unlogitH(mean(x)) ) ) )) Raw.Data Fitted.Estimates SW:1:unclipped 0.50877 0.69520 SW:1:clipped 0.41304 0.43669 SW:8:unclipped 0.67273 0.85231 SW:8:clipped 0.52830 0.66233 NL:1:unclipped 0.88889 0.81887 NL:1:clipped 0.53571 0.60578 NL:8:unclipped 0.96552 0.98830 NL:8:clipped 0.96154 0.96635 SP:1:unclipped 0.98649 0.98361 SP:1:clipped 0.92537 0.95328 SP:8:unclipped 1.00000 0.99308 SP:8:clipped 0.95890 0.97992 > ### Perhaps the cell SP:8:clipped = 1.0 is messing up the fit? > pdf("RawAndFitted.pdf") > par(mar=c(8,3,2,2), las=2) > barplot(t(result), beside=TRUE ) > box(); title("Fractions of Plants Producing Fruits") > legend("topleft", c("Raw Data", "Fitted Values"), + fill=gray.colors(2), bty="n" ) > dev.off() quartz 2 > _ platform powerpc-apple-darwin8.6.0 arch powerpc os darwin8.6.0 system powerpc, darwin8.6.0 status major 2 minor 3.1 year 2006 month 06 day 01 svn rev 38247 language R version.string Version 2.3.1 (2006-06-01) > Dr. M. Hank H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ "E Pluribus Unum"
<see inline> Martin Henry H. Stevens wrote:> Hi folks, > Warning: I don't know if the result I am getting makes sense, so this > may be a statistics question. > > The fitted values from my binomial lmer mixed model seem to > consistently overestimate the cell means, and I don't know why. I > assume I am doing something stupid.I copied your "Raw.Data" and "Fitted.Estimates" into two columns of a data.frame OE and then did a t test as follows: > with(OE, t.test(Raw.Data-Fitted.Estimates)) One Sample t-test data: Raw.Data - Fitted.Estimates t = -2.1662, df = 11, p-value = 0.05313 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -0.0991997752 0.0007897752 sample estimates: mean of x -0.049205 I also tried the same thing with a modification of an example in the "lmer" help file. In that case, lmer seemed to consistently UNDERestimate the cell means. However, the difference again was not statistically significant. I suggest you consider this a disguised Rorschach ink blot test and worry about other things. Hope this helps. Spencer Graves> > Below I include code, and a binary image of the data is available at > this link: > http://www.cas.muohio.edu/~stevenmh/tf.RdataBin > > This was done with `Matrix' version 0.995-10 and `lme4' version > 0.995-2. and R v. 2.3.1 on a Mac, OS 10.4.6. > > The binomial model below ("mod") was reduced from a more complex one > by first using AIC, BIC and LRT for "random" effects, and then > relying on Helmert contrasts and AIC, BIC, and LRT to simplify fixed > effects. Maybe this was wrong? > > > load("tf.RdataBin") > > library(lme4) > > > options(contrasts=c("contr.helmert", "contr.poly")) > > mod <- lmer(tfb ~ reg+nutrient+amd +reg:nutrient+ > + (1|rack) + (1|popu) + (1|gen), data=dat.tb2, family=binomial, > method="Laplace") > > summary(mod) > Generalized linear mixed model fit using Laplace > Formula: tfb ~ reg + nutrient + amd + reg:nutrient + (1 | rack) + (1 > | popu) + (1 | gen) > Data: dat.tb2 > Family: binomial(logit link) > AIC BIC logLik deviance > 402.53 446.64 -191.26 382.53 > Random effects: > Groups Name Variance Std.Dev. > gen (Intercept) 0.385 0.621 > popu (Intercept) 0.548 0.741 > rack (Intercept) 0.401 0.633 > number of obs: 609, groups: gen, 24; popu, 9; rack, 2 > > Estimated scale (compare to 1) 0.80656 > > Fixed effects: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 2.391 0.574 4.17 3.1e-05 > reg1 0.842 0.452 1.86 0.06252 > reg2 0.800 0.241 3.32 0.00091 > nutrient1 0.788 0.197 4.00 6.3e-05 > amd1 -0.540 0.139 -3.88 0.00010 > reg1:nutrient1 0.500 0.227 2.21 0.02734 > reg2:nutrient1 -0.176 0.146 -1.21 0.22794 > > Correlation of Fixed Effects: > (Intr) reg1 reg2 ntrnt1 amd1 rg1:n1 > reg1 0.169 > reg2 -0.066 -0.191 > nutrient1 0.178 0.231 -0.034 > amd1 -0.074 -0.044 -0.052 -0.078 > reg1:ntrnt1 0.157 0.307 -0.180 0.562 -0.002 > reg2:ntrnt1 -0.028 -0.154 0.236 0.141 0.033 -0.378 > > X <- mod @ X > > fitted <- X %*% fixef(mod) > > > > unlogitH <- function(x) {( 1 + exp(-x) )^-1} > > (result <- data.frame(Raw.Data=with(dat.tb2, > + tapply(tfb, list(reg:nutrient:amd), > + mean ) ), > + Fitted.Estimates=with(dat.tb2, > + tapply(fitted, list(reg:nutrient:amd), > + function(x) unlogitH(mean(x)) ) ) )) > Raw.Data Fitted.Estimates > SW:1:unclipped 0.50877 0.69520 > SW:1:clipped 0.41304 0.43669 > SW:8:unclipped 0.67273 0.85231 > SW:8:clipped 0.52830 0.66233 > NL:1:unclipped 0.88889 0.81887 > NL:1:clipped 0.53571 0.60578 > NL:8:unclipped 0.96552 0.98830 > NL:8:clipped 0.96154 0.96635 > SP:1:unclipped 0.98649 0.98361 > SP:1:clipped 0.92537 0.95328 > SP:8:unclipped 1.00000 0.99308 > SP:8:clipped 0.95890 0.97992 > > ### Perhaps the cell SP:8:clipped = 1.0 is messing up the fit? > > pdf("RawAndFitted.pdf") > > par(mar=c(8,3,2,2), las=2) > > barplot(t(result), beside=TRUE ) > > box(); title("Fractions of Plants Producing Fruits") > > legend("topleft", c("Raw Data", "Fitted Values"), > + fill=gray.colors(2), bty="n" ) > > dev.off() > quartz > 2 > > > > _ > platform powerpc-apple-darwin8.6.0 > arch powerpc > os darwin8.6.0 > system powerpc, darwin8.6.0 > status > major 2 > minor 3.1 > year 2006 > month 06 > day 01 > svn rev 38247 > language R > version.string Version 2.3.1 (2006-06-01) > > > > Dr. M. Hank H. Stevens, Assistant Professor > 338 Pearson Hall > Botany Department > Miami University > Oxford, OH 45056 > > Office: (513) 529-4206 > Lab: (513) 529-4262 > FAX: (513) 529-4243 > http://www.cas.muohio.edu/~stevenmh/ > http://www.muohio.edu/ecology/ > http://www.muohio.edu/botany/ > "E Pluribus Unum" > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Wed, 14 Jun 2006, Martin Henry H. Stevens wrote:> Hi folks, > Warning: I don't know if the result I am getting makes sense, so this > may be a statistics question. > > The fitted values from my binomial lmer mixed model seem to > consistently overestimate the cell means, and I don't know why. I > assume I am doing something stupid.Not really, there is something subtle going on. The model says that logit E[Y|x, random effects] = x*beta+random effects Now, when you compute the observed values you are averaging over the random effects to get E[E[Y|x, random effects]]= E[ invlogit(x*beta +random effects)] where invlogit is the inverse of logit. When you compute the fitted values you are also averaging, but on the linear predictor scale to get E[logit(E[Y|x, random effects])]= invlogit(x*beta) The logit/unlogit operation is not linear, so these are not the same. In fact, invlogit(x*beta) is always further from 1/2 than E[Y|X]. With linear regression it is useful and fairly standard to think of the random effects part of a mixed model as giving a model for the covariance of Y, seperate from the fixed-effects model for the mean of Y. With generalized linear models these can no longer be separated: adding random effects changes the values and the meaning of the fixed effects parameters. -thomas