thr3ads.net - R help - [R] Concordance Index

If this information is useful, please help other people find it:
Share via:

K F Pearce

2008-Dec-12 10:06 UTC

[R] Concordance Index - interpretation

Hello everyone.
 
This is a question regarding generation of the concordance index (c
index) in R using the function rcorr.cens.  In particular about
interpretation of its direction and form of the 'predictor'.
 
One of the arguments is a "numeric predictor variable" ( presumably
this is just a *single* predictor variable).  Say this variable takes
numeric values....  Am I correct in thinking that if the c index is >
0.5 (with Somers D positive) then  this tells us that the higher the
numeric values of the 'predictor', the  greater the survival probability
and similarly if the c index is <0.5 (with Somers D negative) then  this
tells us that the higher the numeric values of the 'predictor' the
lower  the survival probability ?
 
The c index  estimates the "probability of concordance between predicted
and observed responses"....Harrel et al (1996) says "in predicting
time
until death, concordance is calculated by considering all possible pairs
of patients, at least one of whom has died.  If the *predicted* survival
time (probability) is larger for the patient who (actually) lived
longer, the predictions for that pair are said to be concordant with the
(actual) outcomes.  ".  I have read that "the c index is defined by
the
proportion of all usable patients in which the predictions and outcomes
are concordant".
 
Now, secondly, I'd like to ask what form the predictor can take.
Presumably if the predictor was a continuous-type variable e.g. 'age'
then predicted survival probability (calculated internally via Cox
regression?) would be compared with actual survival time for each
specific age to get the c index?  Now, if the predictor was an *ordinal
categorical variable* where 1=worst group and 5=best group - I presume
that the c index would be calculated similarly but this time there would
be many ties in the predictor (as regards predicted survival
probability) - hence  if I wanted to count all ties in such a case I
would keep the default argument outx=FALSE? 

Does anyone have a clear reference which gives the formula used to
generate the concordance index (with worked examples)? 
 
Many thanks for your help on these interpretations
Kind Regards,
Kim

Gad Abraham

2008-Dec-13 12:19 UTC

head link

[R] Concordance Index - interpretation

K F Pearce wrote:> Hello everyone.
>  
> This is a question regarding generation of the concordance index (c
> index) in R using the function rcorr.cens.  In particular about
> interpretation of its direction and form of the 'predictor'.
Since Frank Harrell hasn't replied I'll contribute my 2 cents.
>  
> One of the arguments is a "numeric predictor variable" (
presumably
> this is just a *single* predictor variable).  Say this variable takes
> numeric values....  Am I correct in thinking that if the c index is >
> 0.5 (with Somers D positive) then  this tells us that the higher the
> numeric values of the 'predictor', the  greater the survival
probability
> and similarly if the c index is <0.5 (with Somers D negative) then  this
> tells us that the higher the numeric values of the 'predictor' the
> lower  the survival probability ?
The c-index is a generalisation of the area under the ROC curve (AUC), 
therefore it measures how well your model discriminates between 
different responses, i.e., is your predicted response low for low 
observed responses and high for high observed responses. So C > 0.5 
implies a good prediction ability, C = 0.5 implies no predictive ability 
(no better than random guessing), and C < 0.5 implies "good" 
anti-prediction (worse than random, but if you flip the prediction 
direction it becomes a good prediction).
>  
> The c index  estimates the "probability of concordance between
predicted
> and observed responses"....Harrel et al (1996) says "in
predicting time
> until death, concordance is calculated by considering all possible pairs
> of patients, at least one of whom has died.  If the *predicted* survival
> time (probability) is larger for the patient who (actually) lived
> longer, the predictions for that pair are said to be concordant with the
> (actual) outcomes.  ".  I have read that "the c index is defined
by the
> proportion of all usable patients in which the predictions and outcomes
> are concordant".
>  
> Now, secondly, I'd like to ask what form the predictor can take.
> Presumably if the predictor was a continuous-type variable e.g.
'age'
> then predicted survival probability (calculated internally via Cox
> regression?) would be compared with actual survival time for each
> specific age to get the c index?  Now, if the predictor was an *ordinal
> categorical variable* where 1=worst group and 5=best group - I presume
> that the c index would be calculated similarly but this time there would
> be many ties in the predictor (as regards predicted survival
> probability) - hence  if I wanted to count all ties in such a case I
> would keep the default argument outx=FALSE? 
Both the predictor and the actual response can be either continuous or 
categorical, as long as they are ordinal (since it's a rank-based method).

I don't know about the outx part.
> 
> Does anyone have a clear reference which gives the formula used to
> generate the concordance index (with worked examples)? 
I think the explanation in Harrell 1996, Section 5.5 is pretty clear, 
but perhaps could've used some pseudocode. Anyway, I understand it as:

1) Create all pairs of observed responses.
2) For all valid response pairs, i.e., pairs where one response y_1 is 
greater than the other y_2, test whether the corresponding predictions 
are concordant, i.e, yhat_1 > yhat_2. If so add 1 to the running sum s. 
If yhat_1 = yhat_2, add 0.5 to the sum. Count the number n of valid 
response pairs.
3) Divide the total sum s by the number of valid response pairs n.

Here's my simple attempt, unoptimised and doesn't handle censoring:

# yhat: predicted response
# y: observed response
concordance <- function(yhat, y)
{
    s <- 0
    n <- 0
    for(i in seq(along=y))
    {
       for(j in seq(along=y))
       {
	 if(i != j)
	 {
	    if(y[i] > y[j])
	    {
	       s <- s + (yhat[i] > yhat[j]) + 0.5 * (yhat[i] == yhat[j])
	       n <- n + 1
	    }
	 }
       }
    }
    s / n
}

See also Harrell's 2001 book "Regression Modeling Strategies", and
for
the special case of binary outcomes (which is the AUC), Hanley and 
McNeil (1982) "The Meaning and Use of the Area under a Receiver 
Operating Characteristic (ROC) Curve", Radiology 143:29--36.

Cheers,
Gad


-- 
Gad Abraham
Dept. CSSE and NICTA
The University of Melbourne
Parkville 3010, Victoria, Australia
email: gabraham at csse.unimelb.edu.au
web: http://www.csse.unimelb.edu.au/~gabraham

Possibly Parallel Threads

Search for more reasonably related threads

R help - Dec 2008 - Concordance Index - interpretation

[R] Concordance Index - interpretation

[R] Concordance Index - interpretation

Possibly Parallel Threads