thr3ads.net - R help - [R] Elasticnet - Cross validation problem [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Noah Silverman

2013-Mar-14 18:36 UTC

[R] Elasticnet - Cross validation problem

Hello,

I am attempting to use elasticnet to classify a number of documents.

The features are words.  The data is coded into a matrix with each document as a
row and each word as a column.  The data is binary, with {0,1} indicating the
presence of a word.

I want to use the cross validation function of elasticnet (cv.enet).  However,
when the code selects a random subset of the data for a given run, some of the
word columns may be all 0.  (A given word simply isn't present in the subset
of data sampled.)  This causes the the function to return an error about
variance of 0.

Any suggestions on how to mitigate this issue?  Given that I want a 5-fold cross
validation to determine optimal tuning?


Thanks!


--
Noah Silverman, M.S.
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095

Steve Lianoglou

2013-Mar-14 19:54 UTC

head link

[R] Elasticnet - Cross validation problem

Hi,

On Thu, Mar 14, 2013 at 2:36 PM, Noah Silverman <noahsilverman at
ucla.edu> wrote:> Hello,
>
> I am attempting to use elasticnet to classify a number of documents.
>
> The features are words.  The data is coded into a matrix with each document
as a row and each word as a column.  The data is binary, with {0,1} indicating
the presence of a word.
>
> I want to use the cross validation function of elasticnet (cv.enet). 
However, when the code selects a random subset of the data for a given run, some
of the word columns may be all 0.  (A given word simply isn't present in the
subset of data sampled.)  This causes the the function to return an error about
variance of 0.
>
> Any suggestions on how to mitigate this issue?  Given that I want a 5-fold
cross validation to determine optimal tuning?
It looks like you can jimmy-up your own splits for cross validation by
using the `foldid` parameter to `cv.glmnet`, so you can either
construct your own splits to make sure that this scenario that's
tripping you up doesn't happen.

Or, you can create a modified version of the cv function that still
picks samples randomly, but handles situations where you have all 0
columns as a special case -- I guess you would reduce your feature
matrix for that fold, run the goods, then drop the coefs back into the
original "columns" they'd belong to as if you ran the training on
the
full feature matrix.

Know what I mean?

HTH,
-steve

-- 
Steve Lianoglou
Defender of The Thesis
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

Maybe Matching Threads

Search for more apparently analagous threads

R help - Mar 2013 - Elasticnet - Cross validation problem

[R] Elasticnet - Cross validation problem

[R] Elasticnet - Cross validation problem

Maybe Matching Threads