willow1980
2009-Aug-16 14:33 UTC
[R] How to deal with multicollinearity in mixed models (with lmer)?
Dear R users, I have a problem with multicollinearity in mixed models and I am using lmer in package lme4. From previous mailing list, I learn of a reply "http://www.mail-archive.com/r-help at stat.math.ethz.ch/msg38537.html" which states that if not for interpretation but just for prediction, multicollinearity does not matter much. However, I am using mixed model to interpret something, so I am wondering if there is a suitable method to deal with this problem in lmer. My model is: model2<-lmer(sur_prop~(kidc+I(kidc^2)+I(kidc^3))*(byear_c+I(byear_c^2) +I(byear_c^3)+I(byear_c^4))+(byear_c|Studyparish),family=binomial) This is the maximum model and I have not begun to simplify it. The model is used to interpret the pattern how a mother's cohort year and total number of children will affect average survival rate of her children. Kids and byear_c have been centered, so the problem of correlation between linear term and polynomial terms (quadratic, cubic et al) has been solved to some degree. A still serious problem with this model is that number of children is correlated with cohort year, as we know the fact that number of children declines with time. So, would you please give a suggestion to deal with collinearity between kids and byear? Thank you very much for helping! Best regards, -- View this message in context: http://www.nabble.com/How-to-deal-with-multicollinearity-in-mixed-models-%28with-lmer%29--tp24994095p24994095.html Sent from the R help mailing list archive at Nabble.com.
Daniel Malter
2009-Aug-16 17:46 UTC
[R] How to deal with multicollinearity in mixed models (with lmer)?
Hi, more generally you might be overfitting your model by interacting all of the kidc polynomials with all of the year polynomials. Have a look at the following example: year=rep(1:4,each=25) year=year-mean(year) kids=c(sample(c(0:1),25,replace=T),sample(c(0:2),25,replace=T),sample(c(0:3) ,25,replace=T),sample(c(0:4),25,replace=T)) #kids and year are correlated cor(year,kids) #simulate error term e=rnorm(100) #compute an arbitrary dependent variable y=kids+year+e #true model reg1=lm(y~kids+year) summary(reg1) #dummies for year reg2=lm(y~kids+factor(year)) summary(reg2) #"your" model with all sorts of interactions reg3=lm(y~(kids+I(kids^2))*(year+I(year^2))) summary(reg3) #assess variance inflation library(car) vif(reg1) vif(reg2) vif(reg3) Note first that in the first two models the correlation between kids and year is basically not an issue even though the correlation is about 0.5. However, note how you inflate the variance by including the interactions, polynomials, and interacted polynomials between the correlated variables in model reg3 (the first and third order polynomials and the second and fourth order polynomials are, by necessety, always highly correlated). The estimates in reg3 for the true effects are still pretty good though. However, it may easily happen that you find some of the effects that are not the "true" model significant due to overfitting and/or that you find true effects insignificant due to variance inflation. Thus, try a simpler model. Do you really need all the interactions and what for? (If your previous post relates to the same data, collinearity should be a minor issue, as the correlation is moderate at -0.25. The vif you computed there also indicates that. But again, your creating and interacting all the higher order polynomials makes things worse. Further, is it reasonable to assume a "functional" relationship between mortality and years? If not, you should fit year effects using dummy variables or a random effect (the random effects model will only be unbiased if the random effects are uncorrelated with the Xs, which is unlikely due to the correlation of kidc and year). The nice thing about it is that the year fixed effects model is unbiased in your case and spares you from including polynomials for the year. HTH, Daniel ps: If you want to model survival, you may want to think about using hazard models instead. ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von willow1980 Gesendet: Sunday, August 16, 2009 11:27 AM An: r-help at r-project.org Betreff: [R] How to deal with multicollinearity in mixed models (with lmer)? Dear R users, I have a problem with multicollinearity in mixed models and I am using lmer in package lme4. From previous mailing list, I learn of a reply "http://www.mail-archive.com/r-help at stat.math.ethz.ch/msg38537.html" which states that if not for interpretation but just for prediction, multicollinearity does not matter much. However, I am using mixed model to interpret something, so I am wondering if there is a suitable method to deal with this problem in lmer. My model is: model2<-lmer(sur_prop~(kidc+I(kidc^2)+I(kidc^3))*(byear_c+I(byear_c^2) +I(byear_c^3)+I(byear_c^4))+(byear_c|Studyparish),family=binomial) This is the maximum model and I have not begun to simplify it. The model is used to interpret the pattern how a mother's cohort year and total number of children will affect average survival rate of her children. Kids and byear_c have been centered, so the problem of correlation between linear term and polynomial terms (quadratic, cubic et al) has been solved to some degree. A still serious problem with this model is that number of children is correlated with cohort year, as we know the fact that number of children declines with time. So, would you please give a suggestion to deal with collinearity between kids and byear? Thank you very much for helping! Best regards, -- View this message in context: http://www.nabble.com/How-to-deal-with-multicollinearity-in-mixed-models-%28 with-lmer%29--tp24994095p24994095.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.