Emily H. DuVal
2007-Oct-01 16:09 UTC
[R] Interpretation of residual variance components and scale parameters in GLMMs
Dear R-listers, I am working with generalized linear mixed models to quantify the variance due to two nested random factors, but have hit a snag in the interpretation of variance components. Despite my best efforts with Venables & Ripley 2002, Fahrmeir & Tutz 2001, R-help archives, Google, and other eminent sources (i.e. local R gurus), I have not been able to find a definitive answer explaining when the value reported for the residual error represents a scale parameter, and when it represents the actual variance (or standard error) and can be interpreted directly. As I understand it, the residual variance of a glm or glmm with logarithm link is defined as (pi^2)/3, and so most GL(M)M programs and modules do not report this value, but just report the scale parameter (if estimated, otherwise it is 1). For example, this is the case in lmer, where the output statement gives 'Estimated scale (compare to 1)'. This reported scale parameter (but not the variance components of the random terms) then needs to be multiplied by (pi^2)/3 to give the actual residual variance. However, it remains unclear to me whether this is necessary in glmmPQL, or whether the reported 'residual standard error' is (similarly to the standard error attributed to the other random terms in the model) the exact value of sigma, and should simply be squared to get the actual residual variance. In other words, and I am unsure if glmmPQL actually estimates the residual variance component instead of just scaling the expected value. Here's a simplified example dataset (d) to explain what I mean: d = data.frame( "maleID" = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), "successes" = c(5,8,4,6,0,0,4,2,0,0,8,10,0,0,2,0), "status" c('a','a','a','a','b','b','a','a','b','b','a','a','b','b','b','b') ) d$maleID = factor(d$maleID) Essentially: males are found in different status categories; there is variance in success within each status group; and the variance in success between status groups is considerably greater than that within status groups.> model1 = lmer(successes~1 + (1|status) + (1|status:maleID),family = poisson, data = d, method = "PQL") > summary(model1)Generalized linear mixed model fit using PQL Formula: successes ~ 1 + (1 | status) + (1 | status:maleID) Data: d Family: poisson(log link) AIC BIC logLik deviance 29.97 32.29 -11.99 23.97 Random effects: Groups Name Variance Std.Dev. status:maleID (Intercept) 0.15423 0.39273 status (Intercept) 2.12881 1.45904 number of obs: 16, groups: status:maleID, 6; status, 2 Estimated scale (compare to 1 ) 1.010882 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.295 1.089 0.2709 0.786 Question 1: So, in lmer, am I right to multiply the "estimated scale"(here, 1.010483) by the expected variance of a glm with logarithmic link function (1.010483*((3.14159^2)/3) to get the actual residual variance? And is it also correct that no change would be made to the variance components of the random terms, so that variance due to status = 2.12881, and variance of individuals within status categories = 0.15423? Question 2: At risk of sparking comments about the problems of comparing glmmPQL and lmer (NOT the point of this question): running the same data in glmmPQL gives essentially the same values for random variance components, with 1.010882 listed as the "residual StDev" (model2 glmmPQL(successes~1, random =(~1|status/maleID),family = poisson, data = d). Does that mean that the residual variance is actually 1.010882^2? Or should it be interpreted as a scale parameter, as in lmer? Thanks for any insight you can offer! Emily DuVal Postdoctoral Fellow, Max Planck Institute for Ornithology