Thank you for the thorough report.
On 8/8/06, Daniel Farewell <farewelld at cardiff.ac.uk>
wrote:> I'm fitting a GLMM to some questionnaire data. The structure is J
individuals,
> nested within I areas, all of whom answer the same K (ordinal) questions.
The
> model I'm using is based on so-called continuation ratios, so that it
can be
> fitted using the lme4 package.
>
> The lmer function fits the model just fine, but using mcmcsamp to judge the
> variability of the parameter estimates produces some strange results. The
> posterior sample is constant for the fixed effects, and the estimates of
the
> variance components are way out in the tails of their posterior samples.
>
> The model I'm using says (for l = 1, ..., L - 1)
>
> logit P(X[ijk] = l | X[ijk] >= l, U[i], V[j], W[k]) = U[i] + V[j] + W[k]
+ a[l]
>
> where X[ijk] is the ordinal response to question k for individual j in area
i.
> The U, V, and W are random effects and the a's are fixed effects.
Here's a
> function to simulate data which mimics this setup (with a sequence of
binary
> responses Y[ijkl] = 1 iff X[ijk] = l):
>
> sim <- function(n = c(10, 10, 10), sd = c(0.5, 2, 0.5), a = seq(-5, 1,
2)) {
>
> u <- rnorm(n[1], 0, sd[1])
> v <- rnorm(prod(n[1:2]), 0, sd[2])
> w <- rnorm(n[3], 0, sd[3])
>
> i <- factor(rep(1:n[1], each = prod(n[2:3])))
> j <- factor(rep(1:prod(n[1:2]), each = n[3]))
> k <- factor(rep(1:n[3], prod(n[1:2])))
>
> df <- NULL
>
> for(l in 1:length(a)) {
>
> y <- rbinom(length(i), 1, plogis(u[i] + v[j] + w[k] + a[l]))
> df <- rbind(df, data.frame(i, j, k, l, y))
>
> i <- i[!y]
> j <- j[!y]
> k <- k[!y]
>
> }
>
> df$l <- factor(df$l)
> return(df)
>
> }
>
> And here's a function which shows the difficulties I've been
having:
>
> test <- function(seed = 10, ...) {
>
> require(lme4)
> set.seed(seed)
>
> df <- sim(...)
> df.lmer <- lmer(y ~ 0 + l + (1 | i) + (1 | j) + (1 | k), family =
binomial,
> data = df)
> df.mcmc <- mcmcsamp(df.lmer, 1000, trans = FALSE)
>
> print(summary(df.lmer))
> print(summary(df.mcmc))
> densityplot(~ as.numeric(df.mcmc) | factor(rep(colnames(df.mcmc), each
> 1000)), scales = list(relation = "free"))
>
> }
>
> Running
>
> test()
>
> gives the following:
>
> Generalized linear mixed model fit using PQL
> Formula: y ~ 0 + l + (1 | i) + (1 | j) + (1 | k)
> Data: df
> Family: binomial(logit link)
> AIC BIC logLik deviance
> 2316.133 2359.166 -1151.066 2302.133
> Random effects:
> Groups Name Variance Std.Dev.
> j (Intercept) 4.15914 2.03940
> k (Intercept) 0.25587 0.50584
> i (Intercept) 0.56962 0.75473
> number of obs: 3455, groups: j, 100; k, 10; i, 10
>
> Estimated scale (compare to 1) 0.8747598
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> l1 -4.50234 0.40697 -11.0632 < 2.2e-16 ***
> l2 -3.27643 0.38177 -8.5821 < 2.2e-16 ***
> l3 -1.05277 0.36566 -2.8791 0.003988 **
> l4 0.76538 0.36832 2.0780 0.037706 *
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
>
> Correlation of Fixed Effects:
> l1 l2 l3
> l2 0.843
> l3 0.841 0.900
> l4 0.805 0.865 0.921
> l1 l2 l3 l4
> Min. :-4.502 Min. :-3.276 Min. :-1.053 Min. :0.7654
> 1st Qu.:-4.502 1st Qu.:-3.276 1st Qu.:-1.053 1st Qu.:0.7654
> Median :-4.502 Median :-3.276 Median :-1.053 Median :0.7654
> Mean :-4.502 Mean :-3.276 Mean :-1.053 Mean :0.7654
> 3rd Qu.:-4.502 3rd Qu.:-3.276 3rd Qu.:-1.053 3rd Qu.:0.7654
> Max. :-4.502 Max. :-3.276 Max. :-1.053 Max. :0.7654
> j.(In) k.(In) i.(In) deviance
> Min. :1.911 Min. :0.0509 Min. :0.06223 Min. :2011
> 1st Qu.:2.549 1st Qu.:0.1310 1st Qu.:0.19550 1st Qu.:2044
> Median :2.789 Median :0.1756 Median :0.25581 Median :2054
> Mean :2.824 Mean :0.2085 Mean :0.29948 Mean :2054
> 3rd Qu.:3.070 3rd Qu.:0.2463 3rd Qu.:0.34640 3rd Qu.:2064
> Max. :4.615 Max. :0.8804 Max. :3.62168 Max. :2107
>
> As you can see, the posterior samples from the fixed effects are constant
(at
> the inital estimates) and the estimates of the variance components
aren't within
> the IQ ranges of their posterior samples.
>
> I understand from various posts that mcmcsamp is still a work in progress,
and
> may not work on every model. Is this one of those cases? I'm using R
2.3.1 and
> lme4 0.995-2 on Windows XP.
In recent versions of the lme4/Matrix packages setting verbose = TRUE
in a call to mcmcsamp for a generalized linear mixed model causes the
Metropolis-Hastings ratio for the proposed change in the fixed effects
and random effects to be printed. When I ran your example those
values were always 0 which is either extremely bad luck or a bug. My
guess is that it's a bug.
Thanks for the report with the reproducible example.