tdbuskirk
2012-Dec-03 23:30 UTC
[R] Different results from random.Forest with test option and using predict function
Hello R Gurus, I am perplexed by the different results I obtained when I ran code like this: set.seed(100) test1<-randomForest(BinaryY~., data=Xvars, trees=51, mtry=5, seed=200) predict(test1, newdata=cbind(NewBinaryY, NewXs), type="response") and this code: set.seed(100) test2<-randomForest(BinaryY~., data=Xvars, trees=51, mtry=5, seed=200, xtest=NewXs, ytest=NewBinarY) The confusion matrices for the two forests I thought would be the same by virtue of the same seed settings, but they differ as do the predicted values as well as the votes. At first I thought it was just the way ties were broken, so I changed the number of trees to an odd number so there are no ties anymore. Can anyone shed light on what I am hoping is a simple oversight? I just can't figure out why the results of the predictions from these two forests applied to the NewBinaryYs and NewX data sets would not be the same. Thanks for any hints and help. Sincerely, Trent Buskirk -- View this message in context: http://r.789695.n4.nabble.com/Different-results-from-random-Forest-with-test-option-and-using-predict-function-tp4651970.html Sent from the R help mailing list archive at Nabble.com.
Peter Langfelder
2012-Dec-04 04:28 UTC
[R] Different results from random.Forest with test option and using predict function
On Mon, Dec 3, 2012 at 3:30 PM, tdbuskirk <Trent.Buskirk at nielsen.com> wrote:> > Hello R Gurus, > > I am perplexed by the different results I obtained when I ran code like > this: > set.seed(100) > test1<-randomForest(BinaryY~., data=Xvars, trees=51, mtry=5, seed=200) > predict(test1, newdata=cbind(NewBinaryY, NewXs), type="response") >Not sure about this since I haven't used predict.randomForest extensively, but newdata usually contains predictors only, not the response. Try using newdata = NexXs. HTH, Peter> and this code: > set.seed(100) > test2<-randomForest(BinaryY~., data=Xvars, trees=51, mtry=5, seed=200, > xtest=NewXs, ytest=NewBinarY) > > The confusion matrices for the two forests I thought would be the same by > virtue of the same seed settings, but they differ as do the predicted > values > as well as the votes. At first I thought it was just the way ties were > broken, so I changed the number of trees to an odd number so there are no > ties anymore. > > Can anyone shed light on what I am hoping is a simple oversight? I just > can't figure out why the results of the predictions from these two forests > applied to the NewBinaryYs and NewX data sets would not be the same. > > Thanks for any hints and help. > > Sincerely, > > Trent Buskirk > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Different-results-from-random-Forest-with-test-option-and-using-predict-function-tp4651970.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Liaw, Andy
2012-Dec-04 15:04 UTC
[R] Different results from random.Forest with test option and using predict function
Without data to reproduce what you saw, we can only guess. One possibility is due to tie-breaking. There are several places where ties can occur and are broken at random, including at the prediction step. One difference between the two ways of doing prediction is that when it's all done within randomForest(), the test set prediction is performed as each tree is grown. If there is any tie that needs to be broken at any prediction step, it will affect the RNG stream used by the subsequent tree growing step. You can also inspect/compare the "forest" components of the randomForest objects to see if they are the same. At least the first tree in both should be identical. Andy -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of tdbuskirk Sent: Monday, December 03, 2012 6:31 PM To: r-help at r-project.org Subject: [R] Different results from random.Forest with test option and using predict function Hello R Gurus, I am perplexed by the different results I obtained when I ran code like this: set.seed(100) test1<-randomForest(BinaryY~., data=Xvars, trees=51, mtry=5, seed=200) predict(test1, newdata=cbind(NewBinaryY, NewXs), type="response") and this code: set.seed(100) test2<-randomForest(BinaryY~., data=Xvars, trees=51, mtry=5, seed=200, xtest=NewXs, ytest=NewBinarY) The confusion matrices for the two forests I thought would be the same by virtue of the same seed settings, but they differ as do the predicted values as well as the votes. At first I thought it was just the way ties were broken, so I changed the number of trees to an odd number so there are no ties anymore. Can anyone shed light on what I am hoping is a simple oversight? I just can't figure out why the results of the predictions from these two forests applied to the NewBinaryYs and NewX data sets would not be the same. Thanks for any hints and help. Sincerely, Trent Buskirk -- View this message in context: http://r.789695.n4.nabble.com/Different-results-from-random-Forest-with-test-option-and-using-predict-function-tp4651970.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}}