thr3ads.net - R help - [R] bias in AUCRF? [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Jack Luo

2013-Nov-20 20:44 UTC

[R] bias in AUCRF?

Hi,

I am using the AUCRF package for my data and I was firstly impressed by the
high performance of OOB-AUC. But after a while, I feel it might be due to
some sort of bias, which motivates me to use random data (generated using
rnorm) for a test.

The design is very simple: 100 observations with 50 in class 0 and 50 in
class 1. The number of variables is something I tuned (the main idea is
that if there is bias, the performance should increase with more
variables).

Presumably, there is no signal in the data and the true unbiased AUC should
not be too different from 0.5.

The results are worrisome to me: the OOB AUC is a lot higher than 0.5, and
with more variables, it gets even higher.

Am I misunderstanding anything here?

Below is the R code I used to test:

Nvar = 200  # number of variables
Label = as.factor(c(rep(0,50),rep(1,50)))  # class label
AUC_r = NULL

for (k in 1:10) {  # control the randomness of generating random data
  set.seed(k)
  Arandom = matrix(rnorm(Nvar*length(Label)),nc = Nvar)
  DF = data.frame(Arandom,Label = Label)
  for (j in 1:20) {  # control the randomness of OOB
    if (j %% 10 == 0) {cat(k,j,"\n")}
    set.seed(j)
    fit <- AUCRF(Label~., data=DF)
    AUC_r = cbind(AUC_r,fit$AUCcurve$AUC)
  }
}

plot(fit$AUCcurve$k,apply(AUC_r,1,mean),type = "b",pch = 3,xlab =
"# of
Vars", lwd = 2, col = 2,ylab = "OOB-AUC",ylim = c(0.4,1))


Thanks,

-Jack

	[[alternative HTML version deleted]]

David Winsemius

2013-Nov-20 22:50 UTC

head link

[R] bias in AUCRF?

On Nov 20, 2013, at 12:44 PM, Jack Luo wrote:
> Hi,
> 
> I am using the AUCRF package for my data and I was firstly impressed by the
> high performance of OOB-AUC. But after a while, I feel it might be due to
> some sort of bias, which motivates me to use random data (generated using
> rnorm) for a test.
> 
> The design is very simple: 100 observations with 50 in class 0 and 50 in
> class 1. The number of variables is something I tuned (the main idea is
> that if there is bias, the performance should increase with more
> variables).
> 
> Presumably, there is no signal in the data and the true unbiased AUC should
> not be too different from 0.5.
> 
> The results are worrisome to me: the OOB AUC is a lot higher than 0.5, and
> with more variables, it gets even higher.
> 
> Am I misunderstanding anything here?
> 
> Below is the R code I used to test:
> 
> Nvar = 200  # number of variables
> Label = as.factor(c(rep(0,50),rep(1,50)))  # class label
> AUC_r = NULL
> 
> for (k in 1:10) {  # control the randomness of generating random data
>  set.seed(k)
>  Arandom = matrix(rnorm(Nvar*length(Label)),nc = Nvar)
>  DF = data.frame(Arandom,Label = Label)
>  for (j in 1:20) {  # control the randomness of OOB
>    if (j %% 10 == 0) {cat(k,j,"\n")}
>    set.seed(j)
>    fit <- AUCRF(Label~., data=DF)
>    AUC_r = cbind(AUC_r,fit$AUCcurve$AUC)
>  }
> }
> 
> plot(fit$AUCcurve$k,apply(AUC_r,1,mean),type = "b",pch = 3,xlab =
"# of
> Vars", lwd = 2, col = 2,ylab = "OOB-AUC",ylim = c(0.4,1))
> 
Shouldn't this question go to the package maintainer before being sent to
Rhelp?
> 
> Thanks,
> 
> -Jack
> 
> 	[[alternative HTML version deleted]]
And:> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
-- 

David Winsemius
Alameda, CA, USA

R help - Nov 2013 - bias in AUCRF?

[R] bias in AUCRF?

[R] bias in AUCRF?