thr3ads.net - R help - [R] p-values for classification [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Arne.Muller@sanofi-aventis.com

2005-Jul-01 10:14 UTC

[R] p-values for classification

Dear All,

I'm classifying some data with various methods (binary classification).
I'm interpreting the results via a confusion matrix from which I calculate
the sensitifity and the fdr. The classifiers are trained on 575 data points and
my test set has 50 data points.

I'd like to calculate p-values for obtaining <=fdr and >=sensitifity
for each classifier. I was thinking about shuffling/bootstrap the lables of the
test set, classify them and calculating the p-value from the obtained normal
distributed random fdr and sensitifity.

The problem is that it's rather slow when running many rounds of
shuffling/classification (I'd like to do this for many classifiers and
parameter combinations). In addition classification of the 50 test data points
with shuffled lables realistically produces only a  very limited number of
possible fdr's and sensitivities, and I'm wondering if I can realy
believe these values to be normal.

Basically I'm looking for a way to calculate the p-values analytically.
I'd be happy  for any suggestions, web-addresses or references.

	kind regads,

	Arne

Prof Brian Ripley

2005-Jul-01 12:01 UTC

head link

[R] p-values for classification

Not really an R question.

Most classifiers will produce predicted probabilities, and you can check 
their accuracy.  There are lots of details in my PRNN book, and some 
examples in MASS4.

I suggest you adjust your training and test sets to be more nearly equal, 
or use cross-validation.

I don't see how shuffling the labels will help: you want to know how well 
a classifier does when there is a real relationship between the 
explanatory variables and the class.  To take a simple example, suppose 
the classes are clearly linearly separable.  Then a logistic discriminant 
will have nigh-perfect performance on the actual data, but very poor 
performance on permuted labels.  You would do a lot better to simulate 
from a good fitted model, the so-called parametric bootstrapping.

On Fri, 1 Jul 2005 Arne.Muller at sanofi-aventis.com wrote:
> Dear All,
>
> I'm classifying some data with various methods (binary classification).
> I'm interpreting the results via a confusion matrix from which I 
> calculate the sensitifity and the fdr. The classifiers are trained on 
> 575 data points and my test set has 50 data points.
>
> I'd like to calculate p-values for obtaining <=fdr and
>=sensitifity for
> each classifier. I was thinking about shuffling/bootstrap the lables of 
> the test set, classify them and calculating the p-value from the 
> obtained normal distributed random fdr and sensitifity.
>
> The problem is that it's rather slow when running many rounds of 
> shuffling/classification (I'd like to do this for many classifiers and 
> parameter combinations). In addition classification of the 50 test data 
> points with shuffled lables realistically produces only a very limited 
> number of possible fdr's and sensitivities, and I'm wondering if I
can
> realy believe these values to be normal.
>
> Basically I'm looking for a way to calculate the p-values analytically.
> I'd be happy for any suggestions, web-addresses or references.
>
> 	kind regads,
>
> 	Arne
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Seemingly Similar Threads

Search for more reasonably related threads

R help - Jul 2005 - p-values for classification

[R] p-values for classification

[R] p-values for classification

Seemingly Similar Threads