thr3ads.net - R help - [R] ROC optimal threshold [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Anadon Herrera, Jose Daniel

2006-Mar-31 09:58 UTC

[R] ROC optimal threshold

hello,

I am using the ROC package to evaluate predictive models
I have successfully plot the ROC curve, however

?is there anyway to obtain the value of operating point=optimal threshold
value (i.e. the nearest point of the curve to the top-left corner of the
axes)?

thank you very much,


jose daniel anadon
area de ecologia
universidad miguel hernandez

espa?a

Tim Howard

2006-Mar-31 13:01 UTC

head link

[R] ROC optimal threshold

Jose - 

I've struggled a bit with the same question, said another way: "how do
you find the value in a ROC curve that minimizes false positives while
maximizing true positives"?

Here's something I've come up with. I'd be curious to hear from the
list whether anyone thinks this code might get stuck in local minima, or if it
does find the global minimum each time. (I think it's ok).
>From your ROC object you need to grab the sensitivity (=true positive rate)
and specificity (= 1- false positive rate) and the cutoff levels.  Then find the
value that minimizes abs(sensitivity-specificity), or 
sqrt((1-sens)^2)+(1-spec)^2)) as follows:
absMin <- extract[which.min(abs(extract$sens-extract$spec)),];
sqrtMin <- extract[which.min(sqrt((1-extract$sens)^2+(1-extract$spec)^2)),];

In this example, 'extract' is a dataframe containing three columns:
extract$sens = sensitivity values, extract$spec = specificity values,
extract$votes = cutoff values. The command subsets the dataframe to a single row
containing the desired cutoff and the sens and spec values that are associated
with it.

Most of the time these two answers (abs or sqrt) are the same, sometimes they
differ quite a bit.

I do not see this application of ROC curves very often. A question for those
much more knowledgeable than I.... is there a problem with using ROC curves in
this manner?

Tim Howard




Date: Fri, 31 Mar 2006 11:58:14 +0200
From: "Anadon Herrera, Jose Daniel" <jdanadon at umh.es>
Subject: [R] ROC optimal threshold
To: "'r-help at stat.math.ethz.ch'" <r-help at
stat.math.ethz.ch>
Message-ID:
	<79C6D1A4DD5E7B46B663C43C0021236556F66D at mailer-e071.umh.es>
Content-Type: text/plain;	charset=iso-8859-1

hello,

I am using the ROC package to evaluate predictive models
I have successfully plot the ROC curve, however

?is there anyway to obtain the value of operating point=optimal threshold
value (i.e. the nearest point of the curve to the top-left corner of the
axes)?

thank you very much,


jose daniel anadon
area de ecologia
universidad miguel hernandez

espa?a

Tim Howard

2006-Mar-31 18:32 UTC

head link

[R] ROC optimal threshold

Dr. Harrell, 
Thank you for your response. I had noted, and appreciate, your perspective on
ROC in past listserv entries and am glad to have an opportunity to delve a
little deeper.

I (and, I think, Jose Daniel Anadon, the original poster of this question) have
a predictive model for the presence of, say, animal_X. This is a spatial model
that can be represented on maps and is based on known locations where  animal_X
is present and (usually) known locations where animal_X is absent. Output of the
analysis (using any number of analytic routines, including logit, randomForest,
maximum entropy, mahalanobis distance...) is a full map where every spot on the
map has a probability that that particular location has the appropriate habitat
for animal_x.

This output can be visualized by just using a color scale (perhaps blue for low
probability to red for high probability), BUT, there are times when we want to
apply a cutoff to this probability output and create a product where we can say
either "yes, animal_X habitat is predicted here" or "no, animal_X
habitat is not predicted here."

Note this is the final analytic step. There are no later anaylsis steps and so
(possibly) adjustments for multiple comparisons do not come into play.

Indeed, it seems that using a standard process to find a threshold reduces the
arbitrariness of the probabiliity color scale (at what probability do we set
'red'? at what probability do we set 'blue'?).

Are there alternative approaches that reduce the drawbacks you allude to? 

How would you turn a surface of probabilities into a binary surface of yes-no?

Thank you for your time.
Sincerely,
Tim Howard

Ecologist
New York Natural Heritage Program
>>> Frank E Harrell Jr <f.harrell at vanderbilt.edu> 03/31/06
11:20 AM >>>
Choosing cutoffs is frought with difficulties, arbitrariness, 
inefficiency, and the necessity to use a complex adjustment for multiple 
comparisons in later analysis steps unless the dataset used to generate 
the cutoff was so large as could be considered infinite.

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Frank E Harrell Jr

2006-Mar-31 21:01 UTC

head link

[R] ROC optimal threshold

Tim Howard wrote:> Dr. Harrell, 
> Thank you for your response. I had noted, and appreciate, your perspective
on ROC in past listserv entries and am glad to have an opportunity to delve a
little deeper.
> 
> I (and, I think, Jose Daniel Anadon, the original poster of this question)
have a predictive model for the presence of, say, animal_X. This is a spatial
model that can be represented on maps and is based on known locations where 
animal_X is present and (usually) known locations where animal_X is absent.
Output of the analysis (using any number of analytic routines, including logit,
randomForest, maximum entropy, mahalanobis distance...) is a full map where
every spot on the map has a probability that that particular location has the
appropriate habitat for animal_x.
>
> This output can be visualized by just using a color scale (perhaps blue for
low probability to red for high probability), BUT, there are times when we want
to apply a cutoff to this probability output and create a product where we can
say either "yes, animal_X habitat is predicted here" or "no,
animal_X habitat is not predicted here."
> 
> Note this is the final analytic step. There are no later anaylsis steps and
so (possibly) adjustments for multiple comparisons do not come into play.
> 
> Indeed, it seems that using a standard process to find a threshold reduces
the arbitrariness of the probabiliity color scale (at what probability do we set
'red'? at what probability do we set 'blue'?).
> 
> Are there alternative approaches that reduce the drawbacks you allude to? 
> 
> How would you turn a surface of probabilities into a binary surface of
yes-no?
> 
> Thank you for your time.
> Sincerely,
> Tim Howard
> 
> Ecologist
> New York Natural Heritage Program
Tim,

I think that 'animal_X habitat is predicted here' would hide a lot of 
useful information, especially "gray zones" or uncertain areas.   I 
think that a continuous mapping of probabilities to a gray scale or to 
the heat spectrum would work best.  Bill Cleveland also has another idea 
of using 5 saturation levels on each of 2 hues to get 10 levels with 
easier human discrimination.  You might also consider thermometer plots 
which give some of the most accurate human perception of a continuous 
variable.  For the first 2 ideas you may have to round probabilities to 
give just 10 intervals (or use deciles).

If you choose cutpoints from the data, there is uncertainty from the 
cutpoint that may have to be taken into account.  See for example 
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/fehbib.html#roy06dic

Frank
> 
> 
>>>>Frank E Harrell Jr <f.harrell at vanderbilt.edu> 03/31/06
11:20 AM >>>
> 
> 
> Choosing cutoffs is frought with difficulties, arbitrariness, 
> inefficiency, and the necessity to use a complex adjustment for multiple 
> comparisons in later analysis steps unless the dataset used to generate 
> the cutoff was so large as could be considered infinite.
> 

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Reasonably Related Threads

Search for more reasonably related threads

R help - Mar 2006 - ROC optimal threshold

[R] ROC optimal threshold

[R] ROC optimal threshold

[R] ROC optimal threshold

[R] ROC optimal threshold

Reasonably Related Threads