thr3ads.net - R help - [R] glmnet with binary logistic regression [Jul 2011]

If this information is useful, please help other people find it:
Share via:

fongchun

2011-Jul-22 23:51 UTC

[R] glmnet with binary logistic regression

Hi all,

I am using the glmnet R package to run LASSO with binary logistic
regression.  I have over 290 samples with outcome data (0 for alive, 1 for
dead) and over 230 predictor variables.  I currently using LASSO to reduce
the number of predictor variables.

I am using the cv.glmnet function to do 10-fold cross validation on a
sequence of lambda values which I let glmnet determine.  I then take the
optimal lambda value (lambda.1se) which I then use to predict on an
independent cohort.  

What I am finding is that this optimal lambda value fluctuates everytime I
run glmnet with LASSO.  It deviates quite a bit such that each time I
generate an ROC curve for my validation cohort, I get AUC values which
deviate a bit.  Does anyone know why there is such a fluctuation in the
generation of an optimal lambda?  I am thinking it might be due to the 10
fold cross validation step the training set is not being split well know to
have enough alive and dead cases?  Thoughts?

--
View this message in context:
http://r.789695.n4.nabble.com/glmnet-with-binary-logistic-regression-tp3688126p3688126.html
Sent from the R help mailing list archive at Nabble.com.

Patrick Breheny

2011-Jul-23 10:33 UTC

head link

[R] glmnet with binary logistic regression

On 07/22/2011 07:51 PM, fongchun wrote:> I am using the glmnet R package to run LASSO with binary logistic
> regression.
> ...
> What I am finding is that this optimal lambda value fluctuates
> everytime I run glmnet with LASSO.
 > ...> Does anyone know why there is such a fluctuation  in the
> generation of an optimal lambda?
Cross-validation is a random procedure, and the results will vary every 
time.  This reflects the underlying uncertainty regarding the optimal 
lambda.

Or are you saying that you've used glmnet many times, but this time, the 
fluctuations in lambda are much larger than usual?  If so, and you 
suspect a problem with the way that glmnet is partitioning the data set 
into cross-validation folds, you can specify that with the 'foldid'
option.

-- 
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

fongchun

2011-Jul-23 15:43 UTC

head link

[R] glmnet with binary logistic regression

Hi Patrick, 

Thanks for the reply.  I am referring to using the cv.glmnet() function with
10-fold cross validation and letting glmnet determine the lambda sequence. 
The optimal lambda that it is returning fluctuates between different runs of
cv.glmnet.  Sometimes the model that is return deviates from like including
anywhere from 3-25 predictor variables (I am doing LASSO and I originally
had 235 predictor variables).  I will try the foldid option.  

I was also thinking of a bootstrapping approach where I would actually run
cv.glmnet say 100 times and then take the mean/median lambda across all the
cv.glmnet runs.  This way I generate a confidence interval for my optimal
lambda I woud use in the end.

Another question that I have is I am currently using glmnet to help me fit a
two-class predictor (binary logistic regression).  The cv.glmnet() function
has a type.measure parameter which can be set to auc.  If I am understanding
this correctly, for each lambda it is doing 10 cross-validation and at each
fold it is calculating an AUC.  Therefore, the cross-validation score for
this lambda is the AVERAGE auc across all folds?  Or is it they pool the
predicted response values from each fold and then generate one ROC on all
the predicted values?

Thanks,

Fong



--
View this message in context:
http://r.789695.n4.nabble.com/glmnet-with-binary-logistic-regression-tp3688126p3689024.html
Sent from the R help mailing list archive at Nabble.com.

mxkuhn

2011-Jul-24 10:13 UTC

head link

[R] glmnet with binary logistic regression

10 fold cv has high variation compared to other methods. Use repeated cv or the
bootstrap instead (both of which can be used with glmnet by way of the train()
function on the caret package).

Max

On Jul 23, 2011, at 11:43 AM, fongchun <fongchunchan at gmail.com> wrote:
> Hi Patrick, 
> 
> Thanks for the reply.  I am referring to using the cv.glmnet() function
with
> 10-fold cross validation and letting glmnet determine the lambda sequence. 
> The optimal lambda that it is returning fluctuates between different runs
of
> cv.glmnet.  Sometimes the model that is return deviates from like including
> anywhere from 3-25 predictor variables (I am doing LASSO and I originally
> had 235 predictor variables).  I will try the foldid option.  
> 
> I was also thinking of a bootstrapping approach where I would actually run
> cv.glmnet say 100 times and then take the mean/median lambda across all the
> cv.glmnet runs.  This way I generate a confidence interval for my optimal
> lambda I woud use in the end.
> 
> Another question that I have is I am currently using glmnet to help me fit
a
> two-class predictor (binary logistic regression).  The cv.glmnet() function
> has a type.measure parameter which can be set to auc.  If I am
understanding
> this correctly, for each lambda it is doing 10 cross-validation and at each
> fold it is calculating an AUC.  Therefore, the cross-validation score for
> this lambda is the AVERAGE auc across all folds?  Or is it they pool the
> predicted response values from each fold and then generate one ROC on all
> the predicted values?
> 
> Thanks,
> 
> Fong
> 
> 
> 
> --
> View this message in context:
http://r.789695.n4.nabble.com/glmnet-with-binary-logistic-regression-tp3688126p3689024.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Patrick Breheny

2011-Jul-24 12:12 UTC

head link

[R] glmnet with binary logistic regression

On 07/23/2011 11:43 AM, fongchun wrote:> I was also thinking of a bootstrapping approach where I would actually run
> cv.glmnet say 100 times and then take the mean/median lambda across all the
> cv.glmnet runs.  This way I generate a confidence interval for my optimal
> lambda I woud use in the end.
A simpler approach is to increase the number of folds.  If you set the 
number of folds equal to n ("leave-one-out" cross validation), the 
outcome will no longer be random, as there is only one way of choosing 
the fold partitions.  The main reason people settle for 10-fold CV is 
computational convenience when n is large, which is not a large problem 
in your case.

-- 
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

Maybe Matching Threads

Search for more maybe matching threads

R help - Jul 2011 - glmnet with binary logistic regression

[R] glmnet with binary logistic regression

[R] glmnet with binary logistic regression

[R] glmnet with binary logistic regression

[R] glmnet with binary logistic regression

[R] glmnet with binary logistic regression

Maybe Matching Threads