Tuszynski, Jaroslaw W.
2005-Sep-22 18:59 UTC
[R] Survey of ROC AUC / wilcoxon test functions
Hi,
I was lately debugging parts of my 'colAUC' function in caTools package,
and
in a process looked into other packages for calculating Areas Under ROC
Curves (AUC). To my surprise I found at least 6 other functions:
* wilcox.test
* AUC from ROC package,
* performance from ROCR package,
* auROC from limma package,
* ROC from Epi package,
* roc.area from verification package
And I belive I found problems with 3 of them: 'ROC', 'auROC'
& 'roc.area'
I wrote short code to compare them all:
# Compare calAUC with other functions designed for similar purpose
library(caTools)
library(verification)
library(ROC)
library(ROCR)
library(Epi)
library(limma)
library(MASS)
data(cats)
colAUC(cats[,2:3], cats[,1], plotROC=TRUE)
auc = matrix(NA,9,3)
pval = matrix(NA,4,3)
rownames(auc) = c("colAUC(alg='ROC')",
"colAUC(alg='Wilcox')",
"wilcox.test",
"rank", "roc.area", "AUC",
"performance", "ROC",
"auROC")
colnames(auc ) = c("AUC(x)", "AUC(-x)",
"AUC(x+noise)")
rownames(pval) =
c("wilcox.test(exact=1)","wilcox.test(exact=0)",
"roc.area()$p.adj","roc.area()$p")
colnames(pval) = c("p(x)", "p(-x)",
"p(x+noise)")
X = cbind(cats[,2], -cats[,2], cats[,2]+rnorm(nrow(cats))/10 )
y = ifelse(cats[,1]=='F',0,1)
for (i in 1:3) {
x = X[,i]
x1 = x[y==1]; n1 = length(x1); # prepare input data ...
x2 = x[y==0]; n2 = length(x2); # ... into required
format
r = rank(c(x1,x2))
auc[1,i] = colAUC(x, y, alg="ROC")
auc[2,i] = colAUC(x, y, alg="Wilcox")
auc[3,i] = wilcox.test(x1, x2, exact=0)$statistic / (n1*n2)
auc[4,i] = (sum(r[1:n1]) - n1*(n1+1)/2) / (n1*n2)
auc[5,i] = roc.area(y, x)$A.tilda
auc[6,i] = AUC(rocdemo.sca(y, x, dxrule.sca))
auc[7,i] = performance(prediction( x, y),"auc")@y.values[[1]]
auc[8,i] = ROC(x,y,grid=0)$AUC # get AUC by 'ROC'
auc[9,i] = auROC(y, x) # get AUC by 'auROC'
pval[1,i] = wilcox.test(x1, x2, exact=0)$p.value
pval[2,i] = wilcox.test(x1, x2, exact=1)$p.value
pval[3,i] = roc.area(y, x)$p.adj
pval[4,i] = roc.area(y, x)$p
}
print(auc)
print(pval)
Which gave the following results:
> print(auc)
AUC(x) AUC(-x) AUC(x+noise)
colAUC(alg='ROC') 0.8338451 0.8338451 0.8225488
colAUC(alg='Wilcox') 0.8338451 0.8338451 0.8225488
wilcox.test 0.8338451 0.1661549 0.8225488
rank 0.8338451 0.1661549 0.8225488
roc.area 0.8338451 0.1661549 0.8225488
AUC 0.8338451 0.1661549 0.8225488
performance 0.8338451 0.1661549 0.8225488
ROC 0.8338451 0.1654968 0.8225488
auROC 0.8131169 0.1454266 0.8225488
> print(pval)
p(x) p(-x) p(x+noise)
wilcox.test(exact=1) 8.200e-11 8.200e-11 3.774e-10
wilcox.test(exact=0) 8.200e-11 8.200e-11 3.817e-11
roc.area()$p.adj 4.042e-11 1.000e+00 1.861e-10
roc.area()$p 4.446e-11 1.000e+00 1.861e-10
Some thoughts about those results:
- Data used for the testing:
column 1 (x) - has ties,
column 2 (-x) - negative of the same data
column 3 (x+noise) - similar data but with no ties
- All AUC functions gave the same results for data with no ties
- 'ROC' and 'auROC' gave different results for data with ties
- 'colAUC' (my function) returns 'max(auc, 1-auc)' that's
why AUC(x) and
AUC(-x) are the same
- I assume that "roc.area()$p.adj" and "wilcox.test"
p.values suppose to
return the same results, but they are different.
- At least 'roc.area(x)$p.adj' and 'roc.area(-x)$p.adj' should
be the same.
Comments? Corrections?
Jarek
====================================================\====
Jarek Tuszynski, PhD. o / \
Science Applications International Corporation <\__,|
(703) 676-4192 "> \
Jaroslaw.W.Tuszynski at saic.com ` \
Hi Jarek, I am getting read to leave on a trip till next week, so I can't address your observation today. I also can't seem to find the ROC library on CRAN and the moment. Do you know where I would locate it? Thanks for pointing out the discrepancy. I'll talk to you later. Matt ( verification package) Matt Pocernich NCAR - Research Applications Program 303-497-8312
Tuszynski, Jaroslaw W.
2005-Sep-22 19:32 UTC
[R] Survey of ROC AUC / wilcoxon test functions
ROC package can be found in BioConductor repository. Jarek -----Original Message----- From: pocernic at rap.ucar.edu [mailto:pocernic at rap.ucar.edu] Sent: Thursday, September 22, 2005 3:28 PM To: Tuszynski, Jaroslaw W. Cc: (r-help at stat.math.ethz.ch.); bioinf at wehi.edu.au; bxc at steno.dk Subject: Re: Survey of ROC AUC / wilcoxon test functions Hi Jarek, I am getting read to leave on a trip till next week, so I can't address your observation today. I also can't seem to find the ROC library on CRAN and the moment. Do you know where I would locate it? Thanks for pointing out the discrepancy. I'll talk to you later. Matt ( verification package) Matt Pocernich NCAR - Research Applications Program 303-497-8312