Hello everybody, I'm testing the randomForest package in order to do some simulations and I get some trouble with the prediction of new values. The random forest computation is fine but each time I try to predict values with the newly created object, I get an error message. I thought I was because NA values in the dataframe, but I cleaned them and still got the same error. What am I doing wrong ? > library(mlbench) > library(randomForest) > data(Soybean) > test <- sample(1:683, 150, replace=F) > sb.rf <- randomForest(Class~., data=Soybean[-test,]) > sb.rf.pred <- predict(sb.rf, Soybean[test,]) Error in matrix(t1$countts, nr = nclass, nc = ntest) : No data to replace in matrix(...) I did it the same way with rpart and all worked fine : > library(rpart) > sb.rp <- rpart(Class~., data=Soybean[-test,]) > sb.rp.pred <- predict(sb.rp, Soybean[test,], type="class") Thank you all for any advice you can give to me. -- Ir. Yves Brostaux - Statistics and Computer Science Dpt. Gembloux Agricultural University 8, avenue de la Facult? B-5030 Gembloux (Belgium) T?l : +32 (0)81 62 24 69 E-mail : brostaux.y at fsagx.ac.be Web : http://www.fsagx.ac.be/si/
> Hello everybody, > > I'm testing the randomForest package in order to do some simulations and I > get some trouble with the prediction of new values. The random forest > computation is fine but each time I try to predict values with the newly > created object, I get an error message. I thought I was because NA values > in the dataframe, but I cleaned them and still got the same error. What am > I doing wrong ? > > > library(mlbench) > > library(randomForest) > > data(Soybean) > > test <- sample(1:683, 150, replace=F) > > sb.rf <- randomForest(Class~., data=Soybean[-test,]) > > sb.rf.pred <- predict(sb.rf, Soybean[test,]) > Error in matrix(t1$countts, nr = nclass, nc = ntest) : > No data to replace in matrix(...)try R> test <- sample(1:683, 150, replace=FALSE) R> R> st <- Soybean[test,] R> R> sb.rf <- randomForest(Class~., data=Soybean, subset=-test) R> sb.rf.pred <- predict(sb.rf, data=st) R> R> sb.rf.pred[1:10] [1] diaporthe-stem-canker diaporthe-stem-canker diaporthe-stem-canker [4] diaporthe-stem-canker diaporthe-stem-canker diaporthe-stem-canker [7] diaporthe-stem-canker charcoal-rot charcoal-rot [10] charcoal-rot 19 Levels: 2-4-d-injury alternarialeaf-spot anthracnose ... rhizoctonia-root-rot Torsten> > I did it the same way with rpart and all worked fine : > > library(rpart) > > sb.rp <- rpart(Class~., data=Soybean[-test,]) > > sb.rp.pred <- predict(sb.rp, Soybean[test,], type="class") > > Thank you all for any advice you can give to me. > > -- > Ir. Yves Brostaux - Statistics and Computer Science Dpt. > Gembloux Agricultural University > 8, avenue de la Facult? B-5030 Gembloux (Belgium) > T?l : +32 (0)81 62 24 69 > E-mail : brostaux.y at fsagx.ac.be > Web : http://www.fsagx.ac.be/si/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >
Well, thank you for your answer, but this is not doing the right thing, that is predicting the Class value for the test set Soybean[test,]. It gives instead prediction for data used for forest computation (ignoring all data with NA's) ; 'data' argument is simply ignored as the right name for this argument is 'newdata', which still gives the same error when named. > length(sb.rf.pred) [1] 445 > dim(Soybean[test,]) [1] 150 36 > dim(Soybean[-test,]) [1] 533 36 > sb.rf.pred <- predict(sb.rf, newdata=st) Error in matrix(t1$countts, nr = nclass, nc = ntest) : No data to replace in matrix(...) At 13:13 02/04/03, you wrote:> > Hello everybody, > > > > I'm testing the randomForest package in order to do some simulations and I > > get some trouble with the prediction of new values. The random forest > > computation is fine but each time I try to predict values with the newly > > created object, I get an error message. I thought I was because NA values > > in the dataframe, but I cleaned them and still got the same error. What am > > I doing wrong ? > > > > > library(mlbench) > > > library(randomForest) > > > data(Soybean) > > > test <- sample(1:683, 150, replace=F) > > > sb.rf <- randomForest(Class~., data=Soybean[-test,]) > > > sb.rf.pred <- predict(sb.rf, Soybean[test,]) > > Error in matrix(t1$countts, nr = nclass, nc = ntest) : > > No data to replace in matrix(...) > > >try > >R> test <- sample(1:683, 150, replace=FALSE) >R> >R> st <- Soybean[test,] >R> >R> sb.rf <- randomForest(Class~., data=Soybean, subset=-test) >R> sb.rf.pred <- predict(sb.rf, data=st) >R> >R> sb.rf.pred[1:10] > [1] diaporthe-stem-canker diaporthe-stem-canker diaporthe-stem-canker > [4] diaporthe-stem-canker diaporthe-stem-canker diaporthe-stem-canker > [7] diaporthe-stem-canker charcoal-rot charcoal-rot >[10] charcoal-rot >19 Levels: 2-4-d-injury alternarialeaf-spot anthracnose ... >rhizoctonia-root-rot > > >Torsten
Yves, Which version of the package are you using? I get:> soy <- na.omit(Soybean) > ts <- sample(nrow(soy), 150, replace=FALSE) > sb.rf <- randomForest(Class ~ ., data=soy[-ts,]) > table(predict(sb.rf, soy[ts,], type="class"))2-4-d-injury alternarialeaf-spot 0 37 anthracnose bacterial-blight 10 3 bacterial-pustule brown-spot 2 29 brown-stem-rot charcoal-rot 11 7 cyst-nematode diaporthe-pod-&-stem-blight 0 0 diaporthe-stem-canker downy-mildew 4 8 frog-eye-leaf-spot herbicide-injury 17 0 phyllosticta-leaf-spot phytophthora-rot 3 5 powdery-mildew purple-seed-stain 4 5 rhizoctonia-root-rot 5 Cheers, Andy> -----Original Message----- > From: Yves Brostaux [mailto:brostaux.y at fsagx.ac.be] > Sent: Wednesday, April 02, 2003 4:46 AM > To: r-help at stat.math.ethz.ch > Subject: [R] randomForests predict problem > > > Hello everybody, > > I'm testing the randomForest package in order to do some > simulations and I > get some trouble with the prediction of new values. The random forest > computation is fine but each time I try to predict values > with the newly > created object, I get an error message. I thought I was > because NA values > in the dataframe, but I cleaned them and still got the same > error. What am > I doing wrong ? > > > library(mlbench) > > library(randomForest) > > data(Soybean) > > test <- sample(1:683, 150, replace=F) > > sb.rf <- randomForest(Class~., data=Soybean[-test,]) > > sb.rf.pred <- predict(sb.rf, Soybean[test,]) > Error in matrix(t1$countts, nr = nclass, nc = ntest) : > No data to replace in matrix(...) > > I did it the same way with rpart and all worked fine : > > library(rpart) > > sb.rp <- rpart(Class~., data=Soybean[-test,]) > > sb.rp.pred <- predict(sb.rp, Soybean[test,], type="class") > > Thank you all for any advice you can give to me. > > -- > Ir. Yves Brostaux - Statistics and Computer Science Dpt. > Gembloux Agricultural University > 8, avenue de la Facult? B-5030 Gembloux (Belgium) > T?l : +32 (0)81 62 24 69 > E-mail : brostaux.y at fsagx.ac.be > Web : http://www.fsagx.ac.be/si/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >------------------------------------------------------------------------------
Yves, I will add checks for NAs in predict.randomForest(). In the next version of randomForest (currently called 3.9-x), there will be facilities for handling NAs in the training set. However, there's no way to handle NAs in the test set yet. I believe Leo is still working on that. In Leo's v.4 of the Fortran code, he uses proximity from random forest to iteratively impute NAs, starting with column median or mode (depending on variable types). I've implemented this scheme at the R level, so that it works for both regression and classification. There are a couple of things in Leo's new code that I have not added to the package, and that's why the version is 3.9 rather than 4.0. If you would like to test the new code, please let me know. Cheers, Andy> -----Original Message----- > From: Yves Brostaux [mailto:brostaux.y at fsagx.ac.be] > Sent: Wednesday, April 02, 2003 8:34 AM > To: r-help at stat.math.ethz.ch > Cc: Liaw, Andy; Torsten Hothorn > Subject: RE: [R] randomForests predict problem > > > I use randomForest version 3.4-4, but yes, now I correctly > omitted NA's it > works. I should have made a mistake while removing them first time. > > I was surprised that this method doesn't have another way to > deal with NA's > than omitting them. As Torsten Hothorn suggested, the > associated predict > function should then check for NA's in newdata, shouldn't it ? > > Thank you both for your answers ! > > At 15:12 02/04/03, Liaw, Andy wrote: > >Yves, > > > >Which version of the package are you using? I get: > > > > > soy <- na.omit(Soybean) > > > ts <- sample(nrow(soy), 150, replace=FALSE) > > > sb.rf <- randomForest(Class ~ ., data=soy[-ts,]) > > > table(predict(sb.rf, soy[ts,], type="class")) > > > > 2-4-d-injury alternarialeaf-spot > > 0 37 > > anthracnose bacterial-blight > > 10 3 > > bacterial-pustule brown-spot > > 2 29 > > brown-stem-rot charcoal-rot > > 11 7 > > cyst-nematode diaporthe-pod-&-stem-blight > > 0 0 > > diaporthe-stem-canker downy-mildew > > 4 8 > > frog-eye-leaf-spot herbicide-injury > > 17 0 > > phyllosticta-leaf-spot phytophthora-rot > > 3 5 > > powdery-mildew purple-seed-stain > > 4 5 > > rhizoctonia-root-rot > > 5 > > > >Cheers, > >Andy > > > > > -----Original Message----- > > > From: Yves Brostaux [mailto:brostaux.y at fsagx.ac.be] > > > Sent: Wednesday, April 02, 2003 4:46 AM > > > To: r-help at stat.math.ethz.ch > > > Subject: [R] randomForests predict problem > > > > > > > > > Hello everybody, > > > > > > I'm testing the randomForest package in order to do some > > > simulations and I > > > get some trouble with the prediction of new values. The > random forest > > > computation is fine but each time I try to predict values > > > with the newly > > > created object, I get an error message. I thought I was > > > because NA values > > > in the dataframe, but I cleaned them and still got the same > > > error. What am > > > I doing wrong ? > > > > > > > library(mlbench) > > > > library(randomForest) > > > > data(Soybean) > > > > test <- sample(1:683, 150, replace=F) > > > > sb.rf <- randomForest(Class~., data=Soybean[-test,]) > > > > sb.rf.pred <- predict(sb.rf, Soybean[test,]) > > > Error in matrix(t1$countts, nr = nclass, nc = ntest) : > > > No data to replace in matrix(...) > > > > > > I did it the same way with rpart and all worked fine : > > > > library(rpart) > > > > sb.rp <- rpart(Class~., data=Soybean[-test,]) > > > > sb.rp.pred <- predict(sb.rp, Soybean[test,], type="class") > > > > > > Thank you all for any advice you can give to me. > > > > > > -- > > > Ir. Yves Brostaux - Statistics and Computer Science Dpt. > > > Gembloux Agricultural University > > > 8, avenue de la Facult? B-5030 Gembloux (Belgium) > > > T?l : +32 (0)81 62 24 69 > > > E-mail : brostaux.y at fsagx.ac.be > > > Web : http://www.fsagx.ac.be/si/ > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > >------------------------------------------------------------- > ----------------- > >Notice: This e-mail message, together with any attachments, contains > >information of Merck & Co., Inc. (Whitehouse Station, New > Jersey, USA) > >that may be confidential, proprietary copyrighted and/or legally > >privileged, and is intended solely for the use of the > individual or entity > >named on this message. If you are not the intended > recipient, and have > >received this message in error, please immediately return > this by e-mail > >and then delete it. > > > >============================================================> ================> >------------------------------------------------------------------------------