thr3ads.net - R help - [R] glmnet inclusion / exclusion of categorical variables [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Kevin Shaney

2013-Aug-09 13:44 UTC

[R] glmnet inclusion / exclusion of categorical variables

Hello -

I have been using GLMNET of the following form to predict multinomial logistic /
class dependent variables:

mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm,
family="multinomial",standardize=FALSE)

I am using both continuous and categorical variables as predictors, and am using
sparse.model.matrix to code my x's into a matrix.  This is changing an
example categorical variable whose original name / values is {V1 = "1"
or "2" or "3"} into two recoded variables {V12=
"1" or "0" and V13 = "1" or "0"}.

As i am cycling through different penalties, i would like to either have both
recoded variables included or both excluded, but not one included - and
can't figure out how to make that work.   I tried changing the
"type.multinomial" option, as that looks like this option should do
what i want, but can't get it to work (maybe the difference in recoded
variable names is driving this).

To summarize, for categorical variables, i would like to hierarchically
constrain inclusion / exclusion of recoded variables in the model - either all
of the recoded variables from the same original categorical  variable are in, or
all are out.

Thanks!
Kevin

This e-mail message contains information that may be non...{{dropped:7}}

Steve Lianoglou

2013-Aug-09 19:02 UTC

head link

[R] glmnet inclusion / exclusion of categorical variables

Hi,

On Fri, Aug 9, 2013 at 6:44 AM, Kevin Shaney <kevin.shaney at rosetta.com>
wrote:>
> Hello -
>
> I have been using GLMNET of the following form to predict multinomial
logistic / class dependent variables:
>
> mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm,
> family="multinomial",standardize=FALSE)
>
> I am using both continuous and categorical variables as predictors, and am
using sparse.model.matrix to code my x's into a matrix.  This is changing an
example categorical variable whose original name / values is {V1 = "1"
or "2" or "3"} into two recoded variables {V12=
"1" or "0" and V13 = "1" or "0"}.
>
> As i am cycling through different penalties, i would like to either have
both recoded variables included or both excluded, but not one included - and
> can't figure out how to make that work.   I tried changing the
> "type.multinomial" option, as that looks like this option should
do what i want, but can't get it to work (maybe the difference in recoded
variable names is driving this).
>
> To summarize, for categorical variables, i would like to hierarchically
constrain inclusion / exclusion of recoded variables in the model - either all
of the recoded variables from the same original categorical  variable are in, or
all are out.
Pretty sure that you'll need the "grouped lasso" for that. Quick
googling over CRAN suggests:

grplasso: http://cran.r-project.org/web/packages/grplasso/index.html
standGL: http://cran.r-project.org/web/packages/standGL/index.html
gglasso: http://code.google.com/p/gglasso/

Unfortunately it doesn't look like any of them support the equivalent
of family="multinomial", only 2-class classification.

HTH,
-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech

David Winsemius

2013-Aug-09 19:12 UTC

head link

[R] glmnet inclusion / exclusion of categorical variables

On Aug 9, 2013, at 6:44 AM, Kevin Shaney wrote:
> 
> Hello -
> 
> I have been using GLMNET of the following form to predict multinomial
logistic / class dependent variables:
> 
> mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm,
> family="multinomial",standardize=FALSE)
> 
> I am using both continuous and categorical variables as predictors, and am
using sparse.model.matrix to code my x's into a matrix.  This is changing an
example categorical variable whose original name / values is {V1 = "1"
or "2" or "3"} into two recoded variables {V12=
"1" or "0" and V13 = "1" or "0"}.
You set their penalty factors to be 0 to at least observe the case where
inclusion is performed. And setting the penallty factor for both to be small
would allow you to "honestly" use 0 as the estimated coefficient in
such cases where one was estimated and the other not.
> 
> As i am cycling through different penalties, i would like to either have
both recoded variables included or both excluded, but not one included - and
> can't figure out how to make that work.   I tried changing the
> "type.multinomial" option, as that looks like this option should
do what i want, but can't get it to work (maybe the difference in recoded
variable names is driving this).
Doesn't the 'family' argument, used to set what I think you are
calling 'type', just refer to the y argument, rather  than the
predictors. You may want:

   mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm,
type.multinomial="grouped",
                 family="multinomial",standardize=FALSE)
> 
> To summarize, for categorical variables, i would like to hierarchically
constrain inclusion / exclusion of recoded variables in the model - either all
of the recoded variables from the same original categorical  variable are in, or
all are out.
I do understand that I am possibly not directly answering your question, but in
some respect I wonder if it deserves an answer. I think it is meaningful if some
factor levels are "penalized-out" of models.

-- 
David Winsemius
Alameda, CA, USA

R help - Aug 2013 - glmnet inclusion / exclusion of categorical variables

[R] glmnet inclusion / exclusion of categorical variables

[R] glmnet inclusion / exclusion of categorical variables

[R] glmnet inclusion / exclusion of categorical variables