Scott Gilpin
2004-Oct-14 19:40 UTC
[R] random forest problem when calculating variable importance
Hi - When using the randomForest function for regression, I get different results for mean-squared error of the predictions depending on whether or not I specify to calculate variable importance. There is an example below. I looked briefly at the source code, but couldn't find anything that would indicate why calculating variable importance would (or should) change predictions. I'm using randomForest version 4.3-3 (the latest from CRAN), and tried R 1.9.0, 1.9.1 and 2.0.0 on Windows XP, and R 1.9.1 on solaris 8. Thanks, Scott Gilpin library(randomForest) set.seed(2863) x<-matrix(runif(1000),ncol=10) colnames(x)<-1:10 beta<-matrix(c(1,2,3,4,5,0,0,0,0,0),ncol=1) y<-drop(x %*% beta + rnorm(100)) newx<-matrix(runif(1000),ncol=10) newy<-drop(newx %*% beta + rnorm(100)) set.seed(2863) rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=F) print(rf.fit$test$mse[500]) set.seed(2863) rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=T) print(rf.fit$test$mse[500])
Seemingly Similar Threads
- random forest problem when calculating variable importanc e
- randomForest, 'No forest component...' error while calling Predict()
- Different results from random.Forest with test option and using predict function
- Regarding randomForest regression
- random forest question