On 05/24/2010 02:14 AM, Claudia Beleites wrote:> Dear Changbin,
>
>> I want to know how to select the optimal decision threshold from the
ROC
>> curve?
> Depends on what optimal means. I think there are a bunch of different
> criteria used:
>
> - point closest to the ideal model
> - point furthest from the "guessing" model
> - these criteria may include costs, i.e. a FP/FN ratio != 1
> - ...
>
> More practical:
> If you use ROCR: the help of the performance class explains the slots in
> the object. You find there the data of the curve, incl. the thresholds.
>
>> At what threshold will give the highest accuracy?
> to know that, optmize the accuracy as function of the threshold.
>
> Remember: finding the optimal threshold from a ROC curve is a
> data-driven optimization. You need to validate the resulting model with
> independent test data afterwards.
That point is excellent. In addition, such decision analysis assumes
that (1) a forced yes/no decision is acceptable, i.e., a predicted
probability in the middle is forced to be categorized as "low" or
"high"
as opposed to "no decision; get more data", and (2) the
utility/cost/loss function is identical across subjects (which it almost
never is).
Frank
--
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University