On Nov 20, 2013, at 12:44 PM, Jack Luo wrote:
> Hi,
>
> I am using the AUCRF package for my data and I was firstly impressed by the
> high performance of OOB-AUC. But after a while, I feel it might be due to
> some sort of bias, which motivates me to use random data (generated using
> rnorm) for a test.
>
> The design is very simple: 100 observations with 50 in class 0 and 50 in
> class 1. The number of variables is something I tuned (the main idea is
> that if there is bias, the performance should increase with more
> variables).
>
> Presumably, there is no signal in the data and the true unbiased AUC should
> not be too different from 0.5.
>
> The results are worrisome to me: the OOB AUC is a lot higher than 0.5, and
> with more variables, it gets even higher.
>
> Am I misunderstanding anything here?
>
> Below is the R code I used to test:
>
> Nvar = 200 # number of variables
> Label = as.factor(c(rep(0,50),rep(1,50))) # class label
> AUC_r = NULL
>
> for (k in 1:10) { # control the randomness of generating random data
> set.seed(k)
> Arandom = matrix(rnorm(Nvar*length(Label)),nc = Nvar)
> DF = data.frame(Arandom,Label = Label)
> for (j in 1:20) { # control the randomness of OOB
> if (j %% 10 == 0) {cat(k,j,"\n")}
> set.seed(j)
> fit <- AUCRF(Label~., data=DF)
> AUC_r = cbind(AUC_r,fit$AUCcurve$AUC)
> }
> }
>
> plot(fit$AUCcurve$k,apply(AUC_r,1,mean),type = "b",pch = 3,xlab =
"# of
> Vars", lwd = 2, col = 2,ylab = "OOB-AUC",ylim = c(0.4,1))
>
Shouldn't this question go to the package maintainer before being sent to
Rhelp?
>
> Thanks,
>
> -Jack
>
> [[alternative HTML version deleted]]
And:> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
--
David Winsemius
Alameda, CA, USA