thr3ads.net - R help - [R] lmer binomial model overestimating data? [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Martin Henry H. Stevens

2006-Jun-14 12:38 UTC

[R] lmer binomial model overestimating data?

Hi folks,
Warning: I don't know if the result I am getting makes sense, so this  
may be a statistics question.

The fitted values from my binomial lmer mixed model seem to  
consistently overestimate the cell means, and I don't know why. I  
assume I am doing something stupid.

Below I include code, and a binary image of the data is available at  
this link:
http://www.cas.muohio.edu/~stevenmh/tf.RdataBin

This was done with `Matrix' version 0.995-10 and `lme4' version  
0.995-2. and R v. 2.3.1 on a Mac, OS 10.4.6.

The binomial model below ("mod") was reduced from a more complex one  
by first using AIC, BIC and LRT for "random" effects, and then  
relying on Helmert contrasts and AIC, BIC, and LRT to simplify fixed  
effects. Maybe this was wrong?

 > load("tf.RdataBin")
 > library(lme4)

 > options(contrasts=c("contr.helmert", "contr.poly"))
 > mod <- lmer(tfb ~ reg+nutrient+amd +reg:nutrient+
+     (1|rack) + (1|popu) +  (1|gen), data=dat.tb2, family=binomial,  
method="Laplace")
 > summary(mod)
Generalized linear mixed model fit using Laplace
Formula: tfb ~ reg + nutrient + amd + reg:nutrient + (1 | rack) + (1  
|      popu) + (1 | gen)
	  Data: dat.tb2
Family: binomial(logit link)
     AIC    BIC  logLik deviance
402.53 446.64 -191.26   382.53
Random effects:
Groups Name        Variance Std.Dev.
gen    (Intercept) 0.385    0.621
popu   (Intercept) 0.548    0.741
rack   (Intercept) 0.401    0.633
number of obs: 609, groups: gen, 24; popu, 9; rack, 2

Estimated scale (compare to 1)  0.80656

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)
(Intercept)       2.391      0.574    4.17  3.1e-05
reg1              0.842      0.452    1.86  0.06252
reg2              0.800      0.241    3.32  0.00091
nutrient1         0.788      0.197    4.00  6.3e-05
amd1             -0.540      0.139   -3.88  0.00010
reg1:nutrient1    0.500      0.227    2.21  0.02734
reg2:nutrient1   -0.176      0.146   -1.21  0.22794

Correlation of Fixed Effects:
             (Intr) reg1   reg2   ntrnt1 amd1   rg1:n1
reg1         0.169
reg2        -0.066 -0.191
nutrient1    0.178  0.231 -0.034
amd1        -0.074 -0.044 -0.052 -0.078
reg1:ntrnt1  0.157  0.307 -0.180  0.562 -0.002
reg2:ntrnt1 -0.028 -0.154  0.236  0.141  0.033 -0.378
 > X <- mod @ X
 > fitted <- X %*% fixef(mod)
 >
 > unlogitH <- function(x) {( 1 + exp(-x) )^-1}
 > (result <- data.frame(Raw.Data=with(dat.tb2,
+                          tapply(tfb, list(reg:nutrient:amd),
+                          mean ) ),
+         Fitted.Estimates=with(dat.tb2,
+                          tapply(fitted, list(reg:nutrient:amd),
+                          function(x) unlogitH(mean(x))  ) )  ))
                Raw.Data Fitted.Estimates
SW:1:unclipped  0.50877          0.69520
SW:1:clipped    0.41304          0.43669
SW:8:unclipped  0.67273          0.85231
SW:8:clipped    0.52830          0.66233
NL:1:unclipped  0.88889          0.81887
NL:1:clipped    0.53571          0.60578
NL:8:unclipped  0.96552          0.98830
NL:8:clipped    0.96154          0.96635
SP:1:unclipped  0.98649          0.98361
SP:1:clipped    0.92537          0.95328
SP:8:unclipped  1.00000          0.99308
SP:8:clipped    0.95890          0.97992
 > ### Perhaps the cell SP:8:clipped = 1.0 is messing up the fit?
 > pdf("RawAndFitted.pdf")
 > par(mar=c(8,3,2,2), las=2)
 > barplot(t(result), beside=TRUE )
 > box(); title("Fractions of Plants Producing Fruits")
 > legend("topleft", c("Raw Data", "Fitted
Values"),
+        fill=gray.colors(2), bty="n" )
 > dev.off()
quartz
      2
 >

                _
platform       powerpc-apple-darwin8.6.0
arch           powerpc
os             darwin8.6.0
system         powerpc, darwin8.6.0
status
major          2
minor          3.1
year           2006
month          06
day            01
svn rev        38247
language       R
version.string Version 2.3.1 (2006-06-01)
 >

Dr. M. Hank H. Stevens, Assistant Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056

Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/~stevenmh/
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/
"E Pluribus Unum"

Spencer Graves

2006-Jun-17 17:09 UTC

head link

[R] lmer binomial model overestimating data?

<see inline>	

Martin Henry H. Stevens wrote:> Hi folks,
> Warning: I don't know if the result I am getting makes sense, so this  
> may be a statistics question.
> 
> The fitted values from my binomial lmer mixed model seem to  
> consistently overestimate the cell means, and I don't know why. I  
> assume I am doing something stupid.
	  I copied your "Raw.Data" and "Fitted.Estimates" into two
columns of a
data.frame OE and then did a t test as follows:
 > with(OE, t.test(Raw.Data-Fitted.Estimates))

	One Sample t-test

data:  Raw.Data - Fitted.Estimates
t = -2.1662, df = 11, p-value = 0.05313
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  -0.0991997752  0.0007897752
sample estimates:
mean of x
-0.049205

	  I also tried the same thing with a modification of an example in the 
"lmer" help file.  In that case, lmer seemed to consistently 
UNDERestimate the cell means.  However, the difference again was not 
statistically significant.

	  I suggest you consider this a disguised Rorschach ink blot test and 
worry about other things.

	  Hope this helps.
	  Spencer Graves	
> 
> Below I include code, and a binary image of the data is available at  
> this link:
> http://www.cas.muohio.edu/~stevenmh/tf.RdataBin
> 
> This was done with `Matrix' version 0.995-10 and `lme4' version  
> 0.995-2. and R v. 2.3.1 on a Mac, OS 10.4.6.
> 
> The binomial model below ("mod") was reduced from a more complex
one
> by first using AIC, BIC and LRT for "random" effects, and then  
> relying on Helmert contrasts and AIC, BIC, and LRT to simplify fixed  
> effects. Maybe this was wrong?
> 
>  > load("tf.RdataBin")
>  > library(lme4)
> 
>  > options(contrasts=c("contr.helmert",
"contr.poly"))
>  > mod <- lmer(tfb ~ reg+nutrient+amd +reg:nutrient+
> +     (1|rack) + (1|popu) +  (1|gen), data=dat.tb2, family=binomial,  
> method="Laplace")
>  > summary(mod)
> Generalized linear mixed model fit using Laplace
> Formula: tfb ~ reg + nutrient + amd + reg:nutrient + (1 | rack) + (1  
> |      popu) + (1 | gen)
> 	  Data: dat.tb2
> Family: binomial(logit link)
>      AIC    BIC  logLik deviance
> 402.53 446.64 -191.26   382.53
> Random effects:
> Groups Name        Variance Std.Dev.
> gen    (Intercept) 0.385    0.621
> popu   (Intercept) 0.548    0.741
> rack   (Intercept) 0.401    0.633
> number of obs: 609, groups: gen, 24; popu, 9; rack, 2
> 
> Estimated scale (compare to 1)  0.80656
> 
> Fixed effects:
>                 Estimate Std. Error z value Pr(>|z|)
> (Intercept)       2.391      0.574    4.17  3.1e-05
> reg1              0.842      0.452    1.86  0.06252
> reg2              0.800      0.241    3.32  0.00091
> nutrient1         0.788      0.197    4.00  6.3e-05
> amd1             -0.540      0.139   -3.88  0.00010
> reg1:nutrient1    0.500      0.227    2.21  0.02734
> reg2:nutrient1   -0.176      0.146   -1.21  0.22794
> 
> Correlation of Fixed Effects:
>              (Intr) reg1   reg2   ntrnt1 amd1   rg1:n1
> reg1         0.169
> reg2        -0.066 -0.191
> nutrient1    0.178  0.231 -0.034
> amd1        -0.074 -0.044 -0.052 -0.078
> reg1:ntrnt1  0.157  0.307 -0.180  0.562 -0.002
> reg2:ntrnt1 -0.028 -0.154  0.236  0.141  0.033 -0.378
>  > X <- mod @ X
>  > fitted <- X %*% fixef(mod)
>  >
>  > unlogitH <- function(x) {( 1 + exp(-x) )^-1}
>  > (result <- data.frame(Raw.Data=with(dat.tb2,
> +                          tapply(tfb, list(reg:nutrient:amd),
> +                          mean ) ),
> +         Fitted.Estimates=with(dat.tb2,
> +                          tapply(fitted, list(reg:nutrient:amd),
> +                          function(x) unlogitH(mean(x))  ) )  ))
>                 Raw.Data Fitted.Estimates
> SW:1:unclipped  0.50877          0.69520
> SW:1:clipped    0.41304          0.43669
> SW:8:unclipped  0.67273          0.85231
> SW:8:clipped    0.52830          0.66233
> NL:1:unclipped  0.88889          0.81887
> NL:1:clipped    0.53571          0.60578
> NL:8:unclipped  0.96552          0.98830
> NL:8:clipped    0.96154          0.96635
> SP:1:unclipped  0.98649          0.98361
> SP:1:clipped    0.92537          0.95328
> SP:8:unclipped  1.00000          0.99308
> SP:8:clipped    0.95890          0.97992
>  > ### Perhaps the cell SP:8:clipped = 1.0 is messing up the fit?
>  > pdf("RawAndFitted.pdf")
>  > par(mar=c(8,3,2,2), las=2)
>  > barplot(t(result), beside=TRUE )
>  > box(); title("Fractions of Plants Producing Fruits")
>  > legend("topleft", c("Raw Data", "Fitted
Values"),
> +        fill=gray.colors(2), bty="n" )
>  > dev.off()
> quartz
>       2
>  >
> 
>                 _
> platform       powerpc-apple-darwin8.6.0
> arch           powerpc
> os             darwin8.6.0
> system         powerpc, darwin8.6.0
> status
> major          2
> minor          3.1
> year           2006
> month          06
> day            01
> svn rev        38247
> language       R
> version.string Version 2.3.1 (2006-06-01)
>  >
> 
> Dr. M. Hank H. Stevens, Assistant Professor
> 338 Pearson Hall
> Botany Department
> Miami University
> Oxford, OH 45056
> 
> Office: (513) 529-4206
> Lab: (513) 529-4262
> FAX: (513) 529-4243
> http://www.cas.muohio.edu/~stevenmh/
> http://www.muohio.edu/ecology/
> http://www.muohio.edu/botany/
> "E Pluribus Unum"
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Thomas Lumley

2006-Jun-19 18:33 UTC

head link

[R] lmer binomial model overestimating data?

On Wed, 14 Jun 2006, Martin Henry H. Stevens wrote:
> Hi folks,
> Warning: I don't know if the result I am getting makes sense, so this
> may be a statistics question.
>
> The fitted values from my binomial lmer mixed model seem to
> consistently overestimate the cell means, and I don't know why. I
> assume I am doing something stupid.
Not really, there is something subtle going on.  The model says that

  logit E[Y|x, random effects] = x*beta+random effects

Now, when you compute the observed values you are averaging over the 
random effects to get

    E[E[Y|x, random effects]]= E[ invlogit(x*beta +random effects)]

where invlogit is the inverse of logit.

When you compute the fitted values you are also averaging, but on the 
linear predictor scale to get
    E[logit(E[Y|x, random effects])]= invlogit(x*beta)

The logit/unlogit operation is not linear, so these are not the same. In 
fact, invlogit(x*beta) is always further from 1/2 than E[Y|X].

With linear regression it is useful and fairly standard to think of the 
random effects part of a mixed model as giving a model for the covariance 
of Y, seperate from the fixed-effects model for the mean of Y.  With 
generalized linear models these can no longer be separated: adding random 
effects changes the values and the meaning of the fixed effects 
parameters.

 	-thomas

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Jun 2006 - lmer binomial model overestimating data?

[R] lmer binomial model overestimating data?

[R] lmer binomial model overestimating data?

[R] lmer binomial model overestimating data?

Possibly Parallel Threads