Kevin Shaney
2013-Aug-09 13:44 UTC
[R] glmnet inclusion / exclusion of categorical variables
Hello - I have been using GLMNET of the following form to predict multinomial logistic / class dependent variables: mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm, family="multinomial",standardize=FALSE) I am using both continuous and categorical variables as predictors, and am using sparse.model.matrix to code my x's into a matrix. This is changing an example categorical variable whose original name / values is {V1 = "1" or "2" or "3"} into two recoded variables {V12= "1" or "0" and V13 = "1" or "0"}. As i am cycling through different penalties, i would like to either have both recoded variables included or both excluded, but not one included - and can't figure out how to make that work. I tried changing the "type.multinomial" option, as that looks like this option should do what i want, but can't get it to work (maybe the difference in recoded variable names is driving this). To summarize, for categorical variables, i would like to hierarchically constrain inclusion / exclusion of recoded variables in the model - either all of the recoded variables from the same original categorical variable are in, or all are out. Thanks! Kevin This e-mail message contains information that may be non...{{dropped:7}}
Steve Lianoglou
2013-Aug-09 19:02 UTC
[R] glmnet inclusion / exclusion of categorical variables
Hi, On Fri, Aug 9, 2013 at 6:44 AM, Kevin Shaney <kevin.shaney at rosetta.com> wrote:> > Hello - > > I have been using GLMNET of the following form to predict multinomial logistic / class dependent variables: > > mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm, > family="multinomial",standardize=FALSE) > > I am using both continuous and categorical variables as predictors, and am using sparse.model.matrix to code my x's into a matrix. This is changing an example categorical variable whose original name / values is {V1 = "1" or "2" or "3"} into two recoded variables {V12= "1" or "0" and V13 = "1" or "0"}. > > As i am cycling through different penalties, i would like to either have both recoded variables included or both excluded, but not one included - and > can't figure out how to make that work. I tried changing the > "type.multinomial" option, as that looks like this option should do what i want, but can't get it to work (maybe the difference in recoded variable names is driving this). > > To summarize, for categorical variables, i would like to hierarchically constrain inclusion / exclusion of recoded variables in the model - either all of the recoded variables from the same original categorical variable are in, or all are out.Pretty sure that you'll need the "grouped lasso" for that. Quick googling over CRAN suggests: grplasso: http://cran.r-project.org/web/packages/grplasso/index.html standGL: http://cran.r-project.org/web/packages/standGL/index.html gglasso: http://code.google.com/p/gglasso/ Unfortunately it doesn't look like any of them support the equivalent of family="multinomial", only 2-class classification. HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
David Winsemius
2013-Aug-09 19:12 UTC
[R] glmnet inclusion / exclusion of categorical variables
On Aug 9, 2013, at 6:44 AM, Kevin Shaney wrote:> > Hello - > > I have been using GLMNET of the following form to predict multinomial logistic / class dependent variables: > > mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm, > family="multinomial",standardize=FALSE) > > I am using both continuous and categorical variables as predictors, and am using sparse.model.matrix to code my x's into a matrix. This is changing an example categorical variable whose original name / values is {V1 = "1" or "2" or "3"} into two recoded variables {V12= "1" or "0" and V13 = "1" or "0"}.You set their penalty factors to be 0 to at least observe the case where inclusion is performed. And setting the penallty factor for both to be small would allow you to "honestly" use 0 as the estimated coefficient in such cases where one was estimated and the other not.> > As i am cycling through different penalties, i would like to either have both recoded variables included or both excluded, but not one included - and > can't figure out how to make that work. I tried changing the > "type.multinomial" option, as that looks like this option should do what i want, but can't get it to work (maybe the difference in recoded variable names is driving this).Doesn't the 'family' argument, used to set what I think you are calling 'type', just refer to the y argument, rather than the predictors. You may want: mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm, type.multinomial="grouped", family="multinomial",standardize=FALSE)> > To summarize, for categorical variables, i would like to hierarchically constrain inclusion / exclusion of recoded variables in the model - either all of the recoded variables from the same original categorical variable are in, or all are out.I do understand that I am possibly not directly answering your question, but in some respect I wonder if it deserves an answer. I think it is meaningful if some factor levels are "penalized-out" of models. -- David Winsemius Alameda, CA, USA