Dear All, Suppose that you are trying to create a binary logistic model by trying different combinations of predictors. Has R got an automatic way of doing this, i.e., is there some way of automatically generating different tentative models and checking their corresponding AIC value? If so, could you please direct me to an example? Thanks in advance, Paul
Sounds like you want a best subsets regression, the bestglm() function, found in the bestglm() package will do the trick. Jeremy On 4 August 2011 12:23, Paul Smith <phhs80@gmail.com> wrote:> Dear All, > > Suppose that you are trying to create a binary logistic model by > trying different combinations of predictors. Has R got an automatic > way of doing this, i.e., is there some way of automatically generating > different tentative models and checking their corresponding AIC value? > If so, could you please direct me to an example? > > Thanks in advance, > > Paul > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Wonderful! Thanks, Jeremy. Is bestglm() also able of trying nonlinear transformations of the variables, say log(X1) for instance? Paul On Thu, Aug 4, 2011 at 8:28 PM, Jeremy Miles <jeremy.miles at gmail.com> wrote:> > Sounds like you want a best subsets regression, the bestglm() function, > found in the bestglm() package will do the trick. > Jeremy > > On 4 August 2011 12:23, Paul Smith <phhs80 at gmail.com> wrote: >> >> Dear All, >> >> Suppose that you are trying to create a binary logistic model by >> trying different combinations of predictors. Has R got an automatic >> way of doing this, i.e., is there some way of automatically generating >> different tentative models and checking their corresponding AIC value? >> If so, could you please direct me to an example? >> >> Thanks in advance, >> >> Paul >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >
On Aug 4, 2011, at 2:23 PM, Paul Smith wrote:> Dear All, > > Suppose that you are trying to create a binary logistic model by > trying different combinations of predictors. Has R got an automatic > way of doing this, i.e., is there some way of automatically generating > different tentative models and checking their corresponding AIC value? > If so, could you please direct me to an example? > > Thanks in advance, > > PaulHi Paul, If it were not for JSS going on at the moment, you would likely get a reply from Frank Harrell telling you why using this approach is not a good idea. This is tantamount to using a stepwise approach with variables going in and out of the model, based upon either AIC or perhaps Wald p values. If you search the R list archives using rseek.org with keywords such as "stepwise regression Harrell", you will see a plethora of discussions on this over the years. You might want to obtain a copy of Frank's book Regression Modeling Strategies along with Ewout Steyerberg's book Clinical Prediction Models, which cover this topic and offer alternative solutions to model development. These generally include the pre-specification of full models, considering how many covariate degrees of freedom you can reasonably include in the model and applying shrinkage/penalization. If you need to engage in data reduction, you might want to consider using the LASSO, as implemented in the glmnet package on CRAN. More information on this method is available at: http://www-stat.stanford.edu/~tibs/lasso.html. An alternative might be backward elimination, which Frank does touch on and covers in: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/rms.pdf which is a supplement to his course. Automated creation of models ignores the expertise of both the statistician and subject matter experts, to the detriment of inference. Regards, Marc Schwartz