thr3ads.net - R help - [R] glmnet on Autopilot [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Axel Urbiz

2013-Jul-17 22:26 UTC

[R] glmnet on Autopilot

Dear List,

I'm running simulations using the glmnet package. I need to use an
'automated' method for model selection at each iteration of the
simulation.
The cv.glmnet function in the same package is handy for that purpose.
However, in my simulation I have p >> N, and in some cases the selected
model from cv.glmet is essentially shrinking all coefficients to zero. In
this case, the prediction for each instance equals the average of the
response variable. A reproducible example is shown below.

Is there a reasonable way to prevent this from happening in a simulation
setting with glmnet? That is, I'd like the selected model to give me some
useful predictions.

I've tested using alternative loss measures (type.measure argument), but
none is satisfactory in all cases.

This question is not necessarily R related (so sorry for that): when
comparing glmnet with other models in terms of predictive accuracy, is it
fair to make the comparison including those cases in which the `best'
cv.glmnet can do in an automated setting is pred = avg(response)?

library(glmnet)
set.seed(1010)
n=100;p=3000
nzc=trunc(p/10)
x=matrix(rnorm(n*p),n,p)
beta=rnorm(nzc)
fx= x[,seq(nzc)] %*% beta
eps=rnorm(n)*5
y=drop(fx+eps)
px=exp(fx)
px=px/(1+px)
ly=rbinom(n=length(px),prob=px,size=1)

fit.net <- cv.glmnet(x,
                     ly,
                     family = "binomial",
                     alpha = 1, # lasso penalty
                     type.measure = "deviance",
                     standardize = FALSE,
                     intercept = FALSE,
                     nfolds = 10,
                     keep = FALSE)

plot(fit.net)
log(fit.net$lambda.1se)
pred <- predict(fit.net, x,
                type = "response", s = "lambda.1se")
all(coef(fit.net) == 0)
all(pred ==0.5)

Thanks in advance for your thoughts.

Regards,
Lars.

	[[alternative HTML version deleted]]

David Winsemius

2013-Jul-18 02:44 UTC

head link

[R] glmnet on Autopilot

On Jul 17, 2013, at 5:26 PM, Axel Urbiz wrote:
> Dear List,
>
> I'm running simulations using the glmnet package. I need to use an
> 'automated' method for model selection at each iteration of the  
> simulation.
> The cv.glmnet function in the same package is handy for that purpose.
> However, in my simulation I have p >> N, and in some cases the  
> selected
> model from cv.glmet is essentially shrinking all coefficients to  
> zero. In
> this case, the prediction for each instance equals the average of the
> response variable. A reproducible example is shown below.
>
> Is there a reasonable way to prevent this from happening in a  
> simulation
> setting with glmnet? That is, I'd like the selected model to give me  
> some
> useful predictions.
I'd like to expose the premise of the request to criticism. Reporting  
the sample mean in cases where no preditctors meet the criteria for  
significance under penalsization IS an informative response under  
conditions of simulation. The simulated result is telling you that in  
some data situations of modest size assess under a penalized process  
will not deliver a "significant" result. Why does this bother yu\ou?  
The number of such messages would seem to be one measure of the power  
of the method, although other departures from the "true" result would
also be substracted from teh count of runs.

  If you choose to ignore the "evidence", then I "predict"
that you
are also one who chooses to throw out outliers. Both would have a  
similar effect of inflating measures of significance at the expense of  
fideltity to the data. If you want to vary the parameter, then vary  
the penalization and determine the effect of that hyper-parameter.

David Winsemius
>
> I've tested using alternative loss measures (type.measure argument),  
> but
> none is satisfactory in all cases.
>
> This question is not necessarily R related (so sorry for that): when
> comparing glmnet with other models in terms of predictive accuracy,  
> is it
> fair to make the comparison including those cases in which the `best'
> cv.glmnet can do in an automated setting is pred = avg(response)?
>
> library(glmnet)
> set.seed(1010)
> n=100;p=3000
> nzc=trunc(p/10)
> x=matrix(rnorm(n*p),n,p)
> beta=rnorm(nzc)
> fx= x[,seq(nzc)] %*% beta
> eps=rnorm(n)*5
> y=drop(fx+eps)
> px=exp(fx)
> px=px/(1+px)
> ly=rbinom(n=length(px),prob=px,size=1)
>
> fit.net <- cv.glmnet(x,
>                     ly,
>                     family = "binomial",
>                     alpha = 1, # lasso penalty
>                     type.measure = "deviance",
>                     standardize = FALSE,
>                     intercept = FALSE,
>                     nfolds = 10,
>                     keep = FALSE)
>
> plot(fit.net)
> log(fit.net$lambda.1se)
> pred <- predict(fit.net, x,
>                type = "response", s = "lambda.1se")
> all(coef(fit.net) == 0)
> all(pred ==0.5)
>
> Thanks in advance for your thoughts.
>
> Regards,
> Lars.
>
> 	[[alternative HTML version deleted]]
No problems with this posting for my mail client but you should learn  
to use the facilities in gmail to send palin text. Yhey are easy to fnd.

-- 
David Winsemius, MD
Alameda, CA, USA

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Jul 2013 - glmnet on Autopilot

[R] glmnet on Autopilot

[R] glmnet on Autopilot

Possibly Parallel Threads