Paul Johnson
2007-Jun-14 17:50 UTC
[R] random effects in logistic regression (lmer)-- identification question
Hello R users! I've been experimenting with lmer to estimate a mixed model with a dichotomous dependent variable. The goal is to fit a hierarchical model in which we compare the effect of individual and city-level variables. I've run up against a conceptual problem that I expect one of you can clear up for me. The question is about random effects in the context of a model fit with a binomial family and logit link. Unlike an ordinary linear regression, there is no way to estimate an individual level random error in the linear predictor z_i = a + b*x_i + e_i because the variance of e_i is unidentified. The standard deviation of the logistic is pi*s/3, and we assume s=1, so the standard deviation is assumed to be pi/3 (just a bit bigger than 1, if you are comparing against the Standard Normal). The logistic fitting process sets the variance of the error and the parameters a and b are "rescaled" accordingly. As a result, there is an implicit individual-level random effect within a logistic model. There is a good explanation of this issue in Tony Lancaster's textbook An Introduction to Modern Bayesian Econometrics. So we usually end up thinking about a linear predictor in a logistic regression like so z_i = a + b*x_i Random effects can be estimated for "groups" or "clusters" of observations. If j is a grouping variable, then we estimate z_i = a + b*x_i + u_j The variance component here is, as far as I understand, measured on the same scale as the logistic distribution's standard deviation. Currently, I'm working on a project in which there are observations collected in many cities, represented by a variable PLACE. We are comparing the effect of several variables on a response for each of the values of a RACE variable. RACE is dichotomized into "White" and "Nonwhite" by the people who collect the data. For Nonwhites only, we can estimate the effect of individual level predictors (x) on the output (y). fm1 <- lmer( y ~ x + (1 | PLACE), data=dat, , family=binomial, subsetRace %in% c("Nonwhite")) The random effect in this model indicates the variance caused by a Nonwhite's location on the response variable. Random effects: Groups Name Variance Std.Dev. PLACE (Intercept) 0.047326 0.21754 Suppose I estimate models for the 2 races in a combined model like this: fm1 <- lmer( y ~ -1 + Race / (x) + (-1 + Race | PLACE), data=dat, family=binomial) This gives fixed effects estimates that are pretty easy for nonstatisticians to understand. One can look and see the effect of a variable x on people of different races. But the random effect is a bit hard to understand. Since the Race variable is dichotomous, My aim was to see if the variance of the random effect is different for the 2 racial categories. Here are the estimates: Random effects: Groups Name Variance Std.Dev. Corr PLACE RACE_ALLWhite 0.0095429 0.097688 RACE_ALLNonwhite 0.1286597 0.358692 1.000 I can't quite grasp what the correlation means. I BELIEVE the variance values indicate that the experiences of whites are homogeneous across cities, because the variance is negligible for them, while the experiences of Nonwhites are much more city-dependent. What does it mean when the correlation is 1.0? The correlation takes on that value when there are no city-level variables in this model, so I GUESS that it means that all city-level variation is attributed to the random effect. What do you think? If i put in some city level predictors, then the estimates of the variance components change--they essentially disappear to the minimum values--and the correlation is not 1.0 anymore. Random effects: Groups Name Variance Std.Dev. Corr PLACE RACEWhite 5e-10 2.2361e-05 RACENonwhite 5e-10 2.2361e-05 0.001 This indicates that, after adding in the city level variables, the unaccounted for city-level variation is very small. Correct? Now, back to the idea that there is always an implicit individual level random effect in a logistic regression. Is it meaningful to ask "is that individual level random effect different for people of different races?" If so, How can I estimate that? If e_i is the implicit random error, can I ask for another random effect for Nonwhites only, say u_iN, in a model like so: z_i = a + b*x_i + e_i + u_iN Suppose the unique respondent number is ID and we create a new variable NonwhiteID = 0 for Nonwhites = ID for Nonwhites Here's my idea about how to check to see if the individual level variance component for Nonwhites is different from the "baseline" of Whites by fitting this: fm1 <- lmer( y ~ -1 + Race / (x) + (-1 + Race | PLACE) + (1 | NonwhiteID), data=dat, family=binomial) Here, again, I leave out the city-level variables. Random effects: Groups Name Variance Std.Dev. Corr NonwhiteID (Intercept) 5.0000e-10 2.2361e-05 PLACE RACEWhite 1.0575e-02 1.0283e-01 RACENonwhite 1.2880e-01 3.5889e-01 1.000 number of obs: 6201, groups: NonwhiteID, 1736; PLACE, 33 The variance component estimated for NonwhiteID means that the variance observed among Nonwhite respondents is not substantially different from the implicit, unestimated individual level random error. However, it still appears that there is a substantial place effect, for Nonwhites only. Do I understand that right? Well, thanks in advance, as usual. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas
Maybe Matching Threads
- Variance-covariance matrix for beta hat and b hat from lme
- Orthogonalization with different inner products
- basic cubic spline smoothing (resending because not sure about pending)
- basic cubic spline smoothing
- error in recode.defalt ....object '.data' not found