Daniel Ezra Johnson
2006-Dec-31 09:28 UTC
[R] zero random effect sizes with binomial lmer [sorry, ignore previous]
I am fitting models to the responses to a questionnaire that has
seven yes/no questions (Item). For each combination of Subject and
Item, the variable Response is coded as 0 or 1.
I want to include random effects for both Subject and Item. While I
understand that the datasets are fairly small, and there are a lot of
invariant subjects, I do not understand something that is happening
here, and in comparing other subsets of the data.
In the data below, which has been adjusted to show this phenomenon
clearly, the Subject random effect variance is comparable for A
(1.63) and B (1.712), but the Item random effect variance comes out
as 0.109 for B and essentially zero for A (5.00e-10).
Note that the only difference between data set A and data set B
occurs on row 19, where a single instance of Response is changed.
Item avg. in A (%) avg. in B (%)
1 9 9
2 9 9
3 9 9
4 17 17
5 4% 4
6 22 <-> 26
7 17 17
Why does the Item random effect sometimes "crash" to zero, so
sensitively?
Surely there is some more reasonable estimate of the Item effect here than
zero. The items still have clearly different Response behavior.
If one compares the random effect estimates, in fact, one sees that
they are in the correct proportion, with the expected signs. They are
just approximately eight orders of magnitude too small. Is this a bug?
More broadly, is it hopeless to analyze this data in this manner, or
else, what should I try doing differently? It would be very useful to
be able to have reliable estimates of random effect sizes, even when
they are rather small.
I've included replicable code below, sorry that I did not know how to
make it more compact!
a1 <- c(0,0,0,0,0,0,0)
a2 <- c(0,0,0,0,0,0,0)
a3 <- c(0,0,0,0,0,0,0)
a4 <- c(0,0,0,0,0,0,0)
a5 <- c(0,0,0,0,0,0,0)
a6 <- c(0,0,0,0,0,0,0)
a7 <- c(0,0,0,0,0,0,0)
a8 <- c(0,0,0,0,0,0,0)
a9 <- c(0,0,0,0,0,0,0)
a10 <- c(0,0,0,0,0,0,0)
a11 <- c(0,0,0,0,0,0,0)
a12 <- c(0,0,0,0,0,0,0)
a13 <- c(0,0,0,0,0,0,1)
a14 <- c(0,0,0,0,0,0,1)
a15 <- c(0,0,0,0,0,1,0)
a16 <- c(0,0,0,0,1,0,0)
a17 <- c(0,0,0,1,0,0,0)
a18 <- c(0,0,1,0,0,0,0)
a19 <- c(0,1,0,0,0,0,0)
a20 <- c(0,1,0,0,0,1,0)
a21 <- c(0,0,0,1,0,1,1)
a22 <- c(1,0,0,1,0,1,1)
a23 <- c(1,0,1,1,0,1,0)
aa <- rbind
(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,
a21,a22,a23)
b1 <- c(0,0,0,0,0,0,0)
b2 <- c(0,0,0,0,0,0,0)
b3 <- c(0,0,0,0,0,0,0)
b4 <- c(0,0,0,0,0,0,0)
b5 <- c(0,0,0,0,0,0,0)
b6 <- c(0,0,0,0,0,0,0)
b7 <- c(0,0,0,0,0,0,0)
b8 <- c(0,0,0,0,0,0,0)
b9 <- c(0,0,0,0,0,0,0)
b10 <- c(0,0,0,0,0,0,0)
b11 <- c(0,0,0,0,0,0,0)
b12 <- c(0,0,0,0,0,0,0)
b13 <- c(0,0,0,0,0,0,1)
b14 <- c(0,0,0,0,0,0,1)
b15 <- c(0,0,0,0,0,1,0)
b16 <- c(0,0,0,0,1,0,0)
b17 <- c(0,0,0,1,0,0,0)
b18 <- c(0,0,1,0,0,0,0)
b19 <- c(0,1,0,0,0,1,0)
b20 <- c(0,1,0,0,0,1,0)
b21 <- c(0,0,0,1,0,1,1)
b22 <- c(1,0,0,1,0,1,1)
b23 <- c(1,0,1,1,0,1,0)
bb <- rbind
(b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16,b17,b18,b19,b20,
b21,b22,b23)
a <- array(0, c(161,3),
list(NULL,c("Subject","Item","Response")))
for (s in c(1:23))
for (i in c(1:7))
a[7*(s-1)+i,] <- c(s,i,aa[s,i])
A <- data.frame(a)
b <- array(0, c(161,3),
list(NULL,c("Subject","Item","Response")))
for (s in c(1:23))
for (i in c(1:7))
b[7*(s-1)+i,] <- c(s,i,bb[s,i])
B <- data.frame(b)
A.fit <- lmer(Response~(1|Subject)+(1|Item),A,binomial)
B.fit <- lmer(Response~(1|Subject)+(1|Item),B,binomial)
A.fit
B.fit
ranef(A.fit)$Item
ranef(B.fit)$Item
Generalized linear mixed model fit using Laplace
Formula: Response ~ (1 | Subject) + (1 | Item)
Data: A
Family: binomial(logit link)
AIC BIC logLik deviance
120 129 -56.8 114
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 1.63e+00 1.28e+00
Item (Intercept) 5.00e-10 2.24e-05
number of obs: 161, groups: Subject, 23; Item, 7
Estimated scale (compare to 1 ) 0.83326
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.517 0.395 -6.38 1.8e-10 ***
> B.fit
Generalized linear mixed model fit using Laplace
Formula: Response ~ (1 | Subject) + (1 | Item)
Data: B
Family: binomial(logit link)
AIC BIC logLik deviance
123 133 -58.6 117
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 1.712 1.308
Item (Intercept) 0.109 0.330
number of obs: 161, groups: Subject, 23; Item, 7
Estimated scale (compare to 1 ) 0.81445
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.498 0.415 -6.02 1.8e-09 ***
> ranef(A.fit)$Item
(Intercept)
1 -2.8011e-10
2 -2.8011e-10
3 -2.8011e-10
4 7.1989e-10
5 -7.8011e-10
6 1.2199e-09
7 7.1989e-10
> ranef(B.fit)$Item
(Intercept)
1 -0.056937
2 -0.056937
3 -0.056937
4 0.120293
5 -0.146925
6 0.293893
7 0.120293
Daniel Ezra Johnson <johnson4 <at> babel.ling.upenn.edu> writes:> > I am fitting models to the responses to a questionnaire that has > seven yes/no questions (Item). For each combination of Subject and > Item, the variable Response is coded as 0 or 1. > > I want to include random effects for both Subject and Item. While I > understand that the datasets are fairly small, and there are a lot of > invariant subjects, I do not understand something that is happening > here, and in comparing other subsets of the data. > > In the data below, which has been adjusted to show this phenomenon > clearly, the Subject random effect variance is comparable for A > (1.63) and B (1.712), but the Item random effect variance comes out > as 0.109 for B and essentially zero for A (5.00e-10).... Check the list archives for quite a few postings of Professor Brian Ripley on the subject of Hauk-Donner. Dieter
According to pp 197-198 of MASS 4, the Hauck-Donner phenomenon refers to cases when the Wald approximations and the likelihood ratio tests have different p values because of the former underestimating the the change in log-likelihood on setting \beta_i = 0. This seems quite different to me than the phenomenon that Daniel is reporting, and that I believe I saw previously (post: http://tolstoy.newcastle.edu.au/R/e2/help/06/12/6903.html ) I tried an earlier version of R, on a different platform, and got quite different results. Sadly, the *earlier* results are the ones that make sense. eg:> sessionInfo()R version 2.4.1 Patched (2006-12-30 r40330) i386-unknown-freebsd6.1 locale: C attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" other attached packages: lme4 Matrix lattice "0.9975-10" "0.9975-6" "0.14-16"> ranef(lmer(Reaction ~ Days + (1|Subject), sleepstudy,family=Gamma(link="log"))) An object of class "ranef.lmer" [[1]] (Intercept) 308 6.817268e-10 309 -1.369242e-09 310 -1.122033e-09 330 1.164825e-10 331 2.096848e-10 332 1.494418e-10 333 3.042078e-10 334 -6.276876e-11 335 -7.556428e-10 337 1.263863e-09 349 -3.984973e-10 350 2.107439e-10 351 -1.230185e-10 352 6.409427e-10 369 1.224258e-10 370 -1.528146e-10 371 -5.310404e-11 372 3.228682e-10> sessionInfo()Version 2.3.1 (2006-06-01) i386-pc-mingw32 attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" [7] "base" other attached packages: lme4 Matrix lattice "0.995-2" "0.995-20" "0.13-8"> ranef(lmer(Reaction ~ Days + (1|Subject), sleepstudy,+ family=Gamma(link="log"))) An object of class "ranef.lmer" [[1]] (Intercept) 308 0.128473227 309 -0.294234827 310 -0.232009186 330 0.029091372 331 0.046196655 332 0.035400265 333 0.063273674 334 -0.004238362 335 -0.147285458 337 0.220381662 349 -0.070565390 350 0.046822487 351 -0.015994000 352 0.121461879 369 0.030457370 370 -0.021387277 371 -0.002494534 372 0.066650443>Cheers, Andrew On Mon, Jan 01, 2007 at 06:09:12PM +0000, Dieter Menne wrote:> Daniel Ezra Johnson <johnson4 <at> babel.ling.upenn.edu> writes: > > > > > I am fitting models to the responses to a questionnaire that has > > seven yes/no questions (Item). For each combination of Subject and > > Item, the variable Response is coded as 0 or 1. > > > > I want to include random effects for both Subject and Item. While I > > understand that the datasets are fairly small, and there are a lot of > > invariant subjects, I do not understand something that is happening > > here, and in comparing other subsets of the data. > > > > In the data below, which has been adjusted to show this phenomenon > > clearly, the Subject random effect variance is comparable for A > > (1.63) and B (1.712), but the Item random effect variance comes out > > as 0.109 for B and essentially zero for A (5.00e-10). > ... > > Check the list archives for quite a few postings of Professor Brian Ripley on > the subject of Hauk-Donner. > > > Dieter > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/