Wilbert Heeringa
2016-Feb-19 13:01 UTC
[R] mixed-effects models with (g)lmer in R and model selection
Dear all, Mixed-effects models are wonderful for analyzing data, but it is always a hassle to find the best model, i.e. the model with the lowest AIC, especially when the number of predictor variables is large. Presently when trying to find the right model, I perform the following steps: 1. Start with a model containing all predictors. Assuming dependent variable X and predictors A, B, C, D, E, I start with: X~A+B+C+D+E 2. Lmer warns that is has dropped columns/coefficients. These are variables which have a *perfect* correlation with any of the other variables or with a combination of variables. With summary() it can be found which columns have been dropped. Assume predictor D has been dropped, I continue with this model: X~A+B+C+E 3. Subsequently I need to check whether there are variables (or groups of variables) which *strongly* corrrelate to each other. I included the function vif.mer (developed by Austin F. Frank and available at: https://raw.github.com/aufrank/R-hacks/master/mer-utils.R) in my script, and when applying this function to my reduced model, I got vif values for each of the variables. When vif>5 for a predictor, it probably should be removed. In case multiple variables have a vif>5, I first remove the predictor with the highest vif, then re-run lmer en vif.mer. I remove again the predictor with highest vif (if one or more predictors have still a vif>5), and I repeat this until none of the remaining predictors has a vif>5. In case I got a warning "Model failed to converge" in the larger model(s), this warning does not appear any longer in the 'cleaned' model. 4. Assume the following predictors have survived: A, B en E. Now I want to find the combination of predictors that gives the smallest AIC. For three predictors it is easy to try all combinations, but if it would have been 10 predictors, manually trying all combinations would be time-consuming. So I used the function fitLMER.fnc from the LMERConvenienceFunctions package. This function back fit fixed effects, forward fit random effects, and re-back fit fixed effects. I consider the model given by fitLMER.fnc as the right one. I am not an expert in mixed-effects models and have struggled with model selection. I found the procedure which I decribed working, but I would really be appreciate to hear whether the procedure is sound, or whether there are better alternatives. Best, Wilbert [[alternative HTML version deleted]]
Don McKenzie
2016-Feb-19 19:42 UTC
[R] mixed-effects models with (g)lmer in R and model selection
This is a complicated and subtle statistical issue, not an R question, the latter being the purpose of this list. There are people on the list who could give you literate answers, to be sure, but a statistically oriented list would be a better match. e.g., http://stats.stackexchange.com/> On Feb 19, 2016, at 5:01 AM, Wilbert Heeringa <wjheeringa at gmail.com> wrote: > > Dear all, > > Mixed-effects models are wonderful for analyzing data, but it is always a > hassle to find the best model, i.e. the model with the lowest AIC, > especially when the number of predictor variables is large. > > Presently when trying to find the right model, I perform the following > steps: > > 1. > > Start with a model containing all predictors. Assuming dependent > variable X and predictors A, B, C, D, E, I start with: X~A+B+C+D+E > 2. > > Lmer warns that is has dropped columns/coefficients. These are variables > which have a *perfect* correlation with any of the other variables or > with a combination of variables. With summary() it can be found which > columns have been dropped. Assume predictor D has been dropped, I continue > with this model: X~A+B+C+E > 3. > > Subsequently I need to check whether there are variables (or groups of > variables) which *strongly* corrrelate to each other. I included the > function vif.mer (developed by Austin F. Frank and available at: > https://raw.github.com/aufrank/R-hacks/master/mer-utils.R) in my script, > and when applying this function to my reduced model, I got vif values for > each of the variables. When vif>5 for a predictor, it probably should be > removed. In case multiple variables have a vif>5, I first remove the > predictor with the highest vif, then re-run lmer en vif.mer. I remove again > the predictor with highest vif (if one or more predictors have still a > vif>5), and I repeat this until none of the remaining predictors has a > vif>5. In case I got a warning "Model failed to converge" in the larger > model(s), this warning does not appear any longer in the 'cleaned' model. > 4. > > Assume the following predictors have survived: A, B en E. Now I want to > find the combination of predictors that gives the smallest AIC. For three > predictors it is easy to try all combinations, but if it would have been 10 > predictors, manually trying all combinations would be time-consuming. So I > used the function fitLMER.fnc from the LMERConvenienceFunctions package. > This function back fit fixed effects, forward fit random effects, and > re-back fit fixed effects. I consider the model given by fitLMER.fnc as the > right one. > > I am not an expert in mixed-effects models and have struggled with model > selection. I found the procedure which I decribed working, but I would > really be appreciate to hear whether the procedure is sound, or whether > there are better alternatives. > > Best, > > Wilbert > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jianling Fan
2016-Feb-19 23:30 UTC
[R] mixed-effects models with (g)lmer in R and model selection
Hello, Wilbert, You did give a good procedure for lme model selection! thanks! I learn some. I am also working on similar problem recently, maybe you can take a look at "glmmLasso" package, which allows model selection in generalized linear mixed effects models using the LASSO shrinkage method. Regards, Jianling On 19 February 2016 at 07:01, Wilbert Heeringa <wjheeringa at gmail.com> wrote:> Dear all, > > Mixed-effects models are wonderful for analyzing data, but it is always a > hassle to find the best model, i.e. the model with the lowest AIC, > especially when the number of predictor variables is large. > > Presently when trying to find the right model, I perform the following > steps: > > 1. > > Start with a model containing all predictors. Assuming dependent > variable X and predictors A, B, C, D, E, I start with: X~A+B+C+D+E > 2. > > Lmer warns that is has dropped columns/coefficients. These are variables > which have a *perfect* correlation with any of the other variables or > with a combination of variables. With summary() it can be found which > columns have been dropped. Assume predictor D has been dropped, I continue > with this model: X~A+B+C+E > 3. > > Subsequently I need to check whether there are variables (or groups of > variables) which *strongly* corrrelate to each other. I included the > function vif.mer (developed by Austin F. Frank and available at: > https://raw.github.com/aufrank/R-hacks/master/mer-utils.R) in my script, > and when applying this function to my reduced model, I got vif values for > each of the variables. When vif>5 for a predictor, it probably should be > removed. In case multiple variables have a vif>5, I first remove the > predictor with the highest vif, then re-run lmer en vif.mer. I remove again > the predictor with highest vif (if one or more predictors have still a > vif>5), and I repeat this until none of the remaining predictors has a > vif>5. In case I got a warning "Model failed to converge" in the larger > model(s), this warning does not appear any longer in the 'cleaned' model. > 4. > > Assume the following predictors have survived: A, B en E. Now I want to > find the combination of predictors that gives the smallest AIC. For three > predictors it is easy to try all combinations, but if it would have been 10 > predictors, manually trying all combinations would be time-consuming. So I > used the function fitLMER.fnc from the LMERConvenienceFunctions package. > This function back fit fixed effects, forward fit random effects, and > re-back fit fixed effects. I consider the model given by fitLMER.fnc as the > right one. > > I am not an expert in mixed-effects models and have struggled with model > selection. I found the procedure which I decribed working, but I would > really be appreciate to hear whether the procedure is sound, or whether > there are better alternatives. > > Best, > > Wilbert > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2016-Feb-20 01:59 UTC
[R] mixed-effects models with (g)lmer in R and model selection
Absolutely! Even more, consult a local expert in applying mixed effects models. The op's strategy sounded to me like a prescription to produce irreproducible results (due to over fitting). Cheers, Bert On Friday, February 19, 2016, Don McKenzie <dmck at u.washington.edu> wrote:> This is a complicated and subtle statistical issue, not an R question, the > latter being the purpose of this list. There are people on the list who > could give you literate answers, > to be sure, but a statistically oriented list would be a better match. > > e.g., > > http://stats.stackexchange.com/ > > > > On Feb 19, 2016, at 5:01 AM, Wilbert Heeringa <wjheeringa at gmail.com > <javascript:;>> wrote: > > > > Dear all, > > > > Mixed-effects models are wonderful for analyzing data, but it is always a > > hassle to find the best model, i.e. the model with the lowest AIC, > > especially when the number of predictor variables is large. > > > > Presently when trying to find the right model, I perform the following > > steps: > > > > 1. > > > > Start with a model containing all predictors. Assuming dependent > > variable X and predictors A, B, C, D, E, I start with: X~A+B+C+D+E > > 2. > > > > Lmer warns that is has dropped columns/coefficients. These are > variables > > which have a *perfect* correlation with any of the other variables or > > with a combination of variables. With summary() it can be found which > > columns have been dropped. Assume predictor D has been dropped, I > continue > > with this model: X~A+B+C+E > > 3. > > > > Subsequently I need to check whether there are variables (or groups of > > variables) which *strongly* corrrelate to each other. I included the > > function vif.mer (developed by Austin F. Frank and available at: > > https://raw.github.com/aufrank/R-hacks/master/mer-utils.R) in my > script, > > and when applying this function to my reduced model, I got vif values > for > > each of the variables. When vif>5 for a predictor, it probably should > be > > removed. In case multiple variables have a vif>5, I first remove the > > predictor with the highest vif, then re-run lmer en vif.mer. I remove > again > > the predictor with highest vif (if one or more predictors have still a > > vif>5), and I repeat this until none of the remaining predictors has a > > vif>5. In case I got a warning "Model failed to converge" in the larger > > model(s), this warning does not appear any longer in the 'cleaned' > model. > > 4. > > > > Assume the following predictors have survived: A, B en E. Now I want to > > find the combination of predictors that gives the smallest AIC. For > three > > predictors it is easy to try all combinations, but if it would have > been 10 > > predictors, manually trying all combinations would be time-consuming. > So I > > used the function fitLMER.fnc from the LMERConvenienceFunctions > package. > > This function back fit fixed effects, forward fit random effects, and > > re-back fit fixed effects. I consider the model given by fitLMER.fnc > as the > > right one. > > > > I am not an expert in mixed-effects models and have struggled with model > > selection. I found the procedure which I decribed working, but I would > > really be appreciate to hear whether the procedure is sound, or whether > > there are better alternatives. > > > > Best, > > > > Wilbert > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and > more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and > more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) [[alternative HTML version deleted]]