Andrew Gelman
2006-Jan-10 19:29 UTC
[R] lmer(): nested and non-nested factors in logistic regression
Thanks to some help by Doug Bates (and the updated version of the Matrix package), I've refined my question about fitting nested and non-nested factors in lmer(). I can get it to work in linear regression but it crashes in logistic regression. Here's my example: # set up the predictors n.age <- 4 n.edu <- 4 n.rep <- 100 n.state <- 50 n <- n.age*n.edu*n.rep age.id <- rep (1:n.age, each=n.edu*n.rep) edu.id <- rep (1:n.edu, n.age, each=n.rep) age.edu.id <- n.edu*(age.id - 1) + edu.id state.id <- sample (1:n.state, n, replace=TRUE) # simulate the varying parameters a.age <- rnorm (n.age, 1, 2) a.edu <- rnorm (n.edu, 3, 4) a.age.edu <- rnorm (n.age*n.edu, 0, 5) a.state <- rnorm (n.state, 0, 6) # simulate the data and print to check that i did it right y.hat <- a.age[age.id] + a.edu[edu.id] + a.age.edu[age.edu.id] + a.state[state.id] y <- rnorm (n, y.hat, 1) print (cbind (age.id, edu.id, age.edu.id, state.id, y.hat, y)) # this model (and simpler versions) work fine: fit.1 <- lmer (y ~ 1 + (1 | age.id) + (1 | edu.id) + (1 | age.edu.id) + (1 | state.id)) # now go to logistic regression ypos <- ifelse (y > mean(y), 1, 0) # these work fine: fit.2 <- lmer (ypos ~ 1 + (1 | age.id) + (1 | edu.id) + (1 | age.edu.id), family=binomial(link="logit")) fit.3 <- lmer (ypos ~ 1 + (1 | age.id) + (1 | edu.id) + (1 | state.id), family=binomial(link="logit")) # this one causes R to crash!!!!!!! fit.4 <- lmer (ypos ~ 1 + (1 | age.id) + (1 | edu.id) + (1 | age.edu.id) + (1 | state.id), family=binomial(link="logit")) -- All help appreciated. This is for our book on regression and multilevel models, and it would be great if people could get started fitting these models in R before having to do the more elaborate modeling in Bugs. Andrew -- Andrew Gelman Professor, Department of Statistics Professor, Department of Political Science gelman at stat.columbia.edu www.stat.columbia.edu/~gelman Tues, Wed, Thurs: Social Work Bldg (Amsterdam Ave at 122 St), Room 1016 212-851-2142 Mon, Fri: International Affairs Bldg (Amsterdam Ave at 118 St), Room 711 212-854-7075 Mailing address: 1255 Amsterdam Ave, Room 1016 Columbia University New York, NY 10027-5904 212-851-2142 (fax) 212-851-2164
Douglas Bates
2006-Jan-11 18:36 UTC
[R] lmer(): nested and non-nested factors in logistic regression
The version of lmer based on the supernodal Cholesky factorization, which we will release "real soon", does not crash on this example. It does give very large estimates of the variances in that model fit, at least for the simulation that I ran. It is best if you use set.seed(123454321) (or whatever seed appeals to you) before you simulate data if you are going to post the results. That way we can be sure we are running on the same data you did. On 1/10/06, Andrew Gelman <gelman at stat.columbia.edu> wrote:> Thanks to some help by Doug Bates (and the updated version of the Matrix > package), I've refined my question about fitting nested and non-nested > factors in lmer(). I can get it to work in linear regression but it > crashes in logistic regression. Here's my example: > > # set up the predictors > > n.age <- 4 > n.edu <- 4 > n.rep <- 100 > n.state <- 50 > n <- n.age*n.edu*n.rep > age.id <- rep (1:n.age, each=n.edu*n.rep) > edu.id <- rep (1:n.edu, n.age, each=n.rep) > age.edu.id <- n.edu*(age.id - 1) + edu.id > state.id <- sample (1:n.state, n, replace=TRUE) > > # simulate the varying parameters > > a.age <- rnorm (n.age, 1, 2) > a.edu <- rnorm (n.edu, 3, 4) > a.age.edu <- rnorm (n.age*n.edu, 0, 5) > a.state <- rnorm (n.state, 0, 6) > > # simulate the data and print to check that i did it right > > y.hat <- a.age[age.id] + a.edu[edu.id] + a.age.edu[age.edu.id] + > a.state[state.id] > y <- rnorm (n, y.hat, 1) > print (cbind (age.id, edu.id, age.edu.id, state.id, y.hat, y)) > > # this model (and simpler versions) work fine: > > fit.1 <- lmer (y ~ 1 + (1 | age.id) + (1 | edu.id) + (1 | age.edu.id) + > (1 | state.id)) > > # now go to logistic regression > > ypos <- ifelse (y > mean(y), 1, 0) > > # these work fine: > > fit.2 <- lmer (ypos ~ 1 + (1 | age.id) + (1 | edu.id) + (1 | > age.edu.id), family=binomial(link="logit")) > fit.3 <- lmer (ypos ~ 1 + (1 | age.id) + (1 | edu.id) + (1 | state.id), > family=binomial(link="logit")) > > # this one causes R to crash!!!!!!! > > fit.4 <- lmer (ypos ~ 1 + (1 | age.id) + (1 | edu.id) + (1 | age.edu.id) > + (1 | state.id), family=binomial(link="logit")) > > -- > > All help appreciated. This is for our book on regression and multilevel > models, and it would be great if people could get started fitting these > models in R before having to do the more elaborate modeling in Bugs. > > Andrew > > -- > Andrew Gelman > Professor, Department of Statistics > Professor, Department of Political Science > gelman at stat.columbia.edu > www.stat.columbia.edu/~gelman > > Tues, Wed, Thurs: > Social Work Bldg (Amsterdam Ave at 122 St), Room 1016 > 212-851-2142 > Mon, Fri: > International Affairs Bldg (Amsterdam Ave at 118 St), Room 711 > 212-854-7075 > > Mailing address: > 1255 Amsterdam Ave, Room 1016 > Columbia University > New York, NY 10027-5904 > 212-851-2142 > (fax) 212-851-2164 > >
I am working with lmer() in the latest release of Matrix, doing various things including writing a function called mcsamp() that acts as a wrapper for mcmcsamp() and automatically runs multiple chains, diagnoses convergence, and stores the result as a bugs object so it can be plotted. I recognize that at this point, mcmcsamp() is somewhat of a placeholder (since it doesn't work on a lot of models) but I'm sure it will continue to be improved so I'd like to be able to work with it, as a starting point if necessary. Anyway, I couldn't get mcmcsamp() to work with the saveb=TRUE option. Here's a simple example: y <- 1:10 group <- rep (c(1,2), c(5,5)) M1 <- lmer (y ~ 1 + (1 | group)) # works fine mcmcsamp (M1) # works fine mcmcsamp (M1, saveb=TRUE) This last gives an error message: Error in "colnames<-"(`*tmp*`, value = c("(Intercept)", "log(sigma^2)", : length of 'dimnames' [2] not equal to array extent Thanks for your help. Andrew -- Andrew Gelman Professor, Department of Statistics Professor, Department of Political Science gelman at stat.columbia.edu www.stat.columbia.edu/~gelman Tues, Wed, Thurs: Social Work Bldg (Amsterdam Ave at 122 St), Room 1016 212-851-2142 Mon, Fri: International Affairs Bldg (Amsterdam Ave at 118 St), Room 711 212-854-7075 Mailing address: 1255 Amsterdam Ave, Room 1016 Columbia University New York, NY 10027-5904 212-851-2142 (fax) 212-851-2164