thr3ads.net - R help - [R] Training with very few positives [Feb 2013]

If this information is useful, please help other people find it:
Share via:

James Jong

2013-Feb-10 22:36 UTC

[R] Training with very few positives

I have a binary classification problem where the fraction of positives is
very low, e.g. 20 positives in 10,000 examples (0.2%)

What is an appropriate cross validation scheme for training a classifier
with very few positives?

I currently have the following setup:
=======================================    library(caret)
    tmp <- createDataPartition(Y, p = 9/10, times = 3, list = TRUE)
    myCtrl <- trainControl(method = "boot", index = tmp,
timingSamps = 2,
classProbs = TRUE, summaryFunction = twoClassSummary)

    RFmodel <- train(X,Y,method='rf',trControl=myCtrl,tuneLength=1,
metric="ROC")
    SVMmodel <-
train(X,Y,method='svmRadial',trControl=myCtrl,tuneLength=3,
metric="ROC")
    KNNmodel <-
train(X,Y,method='knn',trControl=myCtrl,tuneLength=10,
metric="ROC")
    NNmodel <- train(X,Y,method='nnet',trControl=myCtrl,tuneLength=3,
trace
= FALSE, metric="ROC")

=======================================but I am not getting good performance (my
ROC values are < 0.7 for all the
classifiers above). Any thoughts?

Thanks,

James

	[[alternative HTML version deleted]]

Ben Bolker

2013-Feb-11 14:19 UTC

head link

[R] Training with very few positives

James Jong <ribonucleico <at> gmail.com> writes:
> 
> I have a binary classification problem where the fraction of positives is
> very low, e.g. 20 positives in 10,000 examples (0.2%)
> 
> What is an appropriate cross validation scheme for training a classifier
> with very few positives?
  [snip]
> =======================================> but I am not getting good
performance (my ROC values are < 0.7 for all the
> classifiers above). Any thoughts?
> 
  My thought is that there probably just isn't any way to get
good performance from this data set.  The effective size of your
data set is 20, which means it's very small, which means you may
just have reached the limits of your predictive power ...

  Ben Bolker

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Feb 2013 - Training with very few positives

[R] Training with very few positives

[R] Training with very few positives

Possibly Parallel Threads