thr3ads.net - R help - [R] randomForest out of bag prediction [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Witold E Wolski

2019-Jan-12 17:55 UTC

[R] randomForest out of bag prediction

Hello,

I am just not sure what the predict.RandomForest function is doing...
I confused.

I would expect the predictions for these 2 function calls to predict the same:
```{r}
diachp.rf <- randomForest(quality~.,data=data,ntree=50, importance=TRUE)

ypred_oob <- predict(diachp.rf)
dataX <- data %>% select(-quality) # remove response.
ypred <- predict( diachp.rf, dataX )

ypred_oob == ypred
```
These are both out of bag predictions but ypred and ypred_oob are
actually they are very different.
> table(ypred_oob , data$quality)
ypred_oob    0    1
        0 1324  346
        1  493 2837> table(ypred , data$quality)
ypred    0    1
    0 1817    0
    1    0 3183

What I find even more disturbing is that 100% accuracy for ypred.
Would you agree that this is rather unexpected?

regards
Witek
-- 
Witold Eryk Wolski

Michael Mayer

2019-Jan-12 18:16 UTC

head link

[R] randomForest out of bag prediction

predict(diachp.rf, dataX) returns the in-sample predictions, not the OOB
predictions. The response variable ?quality? is only used during model fit, not
during prediction.

Since in-sample predictions of random forests are typically grossly overfitted
by construction, extremely high accuracies are not unexpected.

Gesendet von Mail f?r Windows 10

Von: Witold E Wolski
Gesendet: Samstag, 12. Januar 2019 18:56
An: r-help at r-project.org
Betreff: [R] randomForest out of bag prediction

Hello,

I am just not sure what the predict.RandomForest function is doing...
I confused.

I would expect the predictions for these 2 function calls to predict the same:
```{r}
diachp.rf <- randomForest(quality~.,data=data,ntree=50, importance=TRUE)

ypred_oob <- predict(diachp.rf)
dataX <- data %>% select(-quality) # remove response.
ypred <- predict( diachp.rf, dataX )

ypred_oob == ypred
```
These are both out of bag predictions but ypred and ypred_oob are
actually they are very different.
> table(ypred_oob , data$quality)
ypred_oob    0    1
        0 1324  346
        1  493 2837> table(ypred , data$quality)
ypred    0    1
    0 1817    0
    1    0 3183

What I find even more disturbing is that 100% accuracy for ypred.
Would you agree that this is rather unexpected?

regards
Witek
-- 
Witold Eryk Wolski

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


	[[alternative HTML version deleted]]

Bert Gunter

2019-Jan-12 18:16 UTC

head link

[R] randomForest out of bag prediction

Off topic.
But see here:
https://stats.stackexchange.com/questions/61405/random-forest-and-prediction

-- Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Jan 12, 2019 at 9:56 AM Witold E Wolski <wewolski at gmail.com>
wrote:
> Hello,
>
> I am just not sure what the predict.RandomForest function is doing...
> I confused.
>
> I would expect the predictions for these 2 function calls to predict the
> same:
> ```{r}
> diachp.rf <- randomForest(quality~.,data=data,ntree=50, importance=TRUE)
>
> ypred_oob <- predict(diachp.rf)
> dataX <- data %>% select(-quality) # remove response.
> ypred <- predict( diachp.rf, dataX )
>
> ypred_oob == ypred
> ```
> These are both out of bag predictions but ypred and ypred_oob are
> actually they are very different.
>
> > table(ypred_oob , data$quality)
>
> ypred_oob    0    1
>         0 1324  346
>         1  493 2837
> > table(ypred , data$quality)
>
> ypred    0    1
>     0 1817    0
>     1    0 3183
>
> What I find even more disturbing is that 100% accuracy for ypred.
> Would you agree that this is rather unexpected?
>
> regards
> Witek
> --
> Witold Eryk Wolski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Peter Langfelder

2019-Jan-12 18:56 UTC

head link

[R] randomForest out of bag prediction

See inline.

On Sat, Jan 12, 2019 at 9:56 AM Witold E Wolski <wewolski at gmail.com>
wrote:
> ypred_oob <- predict(diachp.rf)
AFAIK these are, indeed, the out-of-bag predictions.
> dataX <- data %>% select(-quality) # remove response.
> ypred <- predict( diachp.rf, dataX )
These are not out of bag predictions. dataX is interpreted as new data
(argument newdata), and it is assumed to contain entirely new
observations. Each observation in dataX is fed through all of the
trees and the predictions are then pooled. There is no out-of-bag here
- all of the new data observations are assumed to be independent of
the training set.
>
> What I find even more disturbing is that 100% accuracy for ypred.
> Would you agree that this is rather unexpected?
It is expected (and not disturbing) l if your training set had enough
variables (or signal) to create trees that fit the training data
perfectly.

HTH,

Peter

R help - Jan 2019 - randomForest out of bag prediction

[R] randomForest out of bag prediction

[R] randomForest out of bag prediction

[R] randomForest out of bag prediction

[R] randomForest out of bag prediction