thr3ads.net - R help - [R] Random Forest: OOB performance = test set performance? [Apr 2021]

If this information is useful, please help other people find it:
Share via:

thebudget72 m@iii@g oii gm@ii@com

2021-Apr-11 03:48 UTC

[R] Random Forest: OOB performance = test set performance?

Hi ML,

For random forest, I thought that the out-of-bag performance should be 
the same (or at least very similar) to the performance calculated on a 
separated test set.

But this does not seem to be the case.

In the following code, the accuracy computed on out-of-bag sample is 
77.81%, while the one computed on a separated test set is 81%.

Can you please check what I am doing wrong?

Thanks in advance and best regards.

library(randomForest)
library(ISLR)

Carseats$High <- ifelse(Carseats$Sales<=8,"No","Yes")
Carseats$High <- as.factor(Carseats$High)

train = sample(1:nrow(Carseats), 200)

rf = randomForest(High~.-Sales,
 ????????????????? data=Carseats,
 ????????????????? subset=train,
 ????????????????? mtry=6,
 ????????????????? importance=T)

acc <- (rf$confusion[1,1] + rf$confusion[2,2]) / sum(rf$confusion)
print(paste0("Accuracy OOB: ", round(acc*100,2), "%"))

yhat <- predict(rf, newdata=Carseats[-train,])
y <- Carseats[-train,]$High
conftest <- table(y, yhat)
acctest <- (conftest[1,1] + conftest[2,2]) / sum(conftest)
print(paste0("Accuracy test set: ", round(acctest*100,2),
"%"))

Peter Langfelder

2021-Apr-11 04:34 UTC

head link

[R] Random Forest: OOB performance = test set performance?

I think the only thing you are doing wrong is not setting the random
seed (set.seed()) so your results are not reproducible. Depending on
the random sample used to select the training and test sets, you get
slightly varying accuracy for both, sometimes one is better and
sometimes the other.

HTH,

Peter

On Sat, Apr 10, 2021 at 8:49 PM <thebudget72 at gmail.com>
wrote:>
> Hi ML,
>
> For random forest, I thought that the out-of-bag performance should be
> the same (or at least very similar) to the performance calculated on a
> separated test set.
>
> But this does not seem to be the case.
>
> In the following code, the accuracy computed on out-of-bag sample is
> 77.81%, while the one computed on a separated test set is 81%.
>
> Can you please check what I am doing wrong?
>
> Thanks in advance and best regards.
>
> library(randomForest)
> library(ISLR)
>
> Carseats$High <-
ifelse(Carseats$Sales<=8,"No","Yes")
> Carseats$High <- as.factor(Carseats$High)
>
> train = sample(1:nrow(Carseats), 200)
>
> rf = randomForest(High~.-Sales,
>                    data=Carseats,
>                    subset=train,
>                    mtry=6,
>                    importance=T)
>
> acc <- (rf$confusion[1,1] + rf$confusion[2,2]) / sum(rf$confusion)
> print(paste0("Accuracy OOB: ", round(acc*100,2), "%"))
>
> yhat <- predict(rf, newdata=Carseats[-train,])
> y <- Carseats[-train,]$High
> conftest <- table(y, yhat)
> acctest <- (conftest[1,1] + conftest[2,2]) / sum(conftest)
> print(paste0("Accuracy test set: ", round(acctest*100,2),
"%"))
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Apr 2021 - Random Forest: OOB performance = test set performance?

[R] Random Forest: OOB performance = test set performance?

[R] Random Forest: OOB performance = test set performance?