gianni lavaredo
2011-Nov-16 13:01 UTC
[R] problem to tunning RandomForest, an unexpected result
Dear Researches, I am using RF (in regression way) for analize several metrics extract from image. I am tuning RF setting a loop using different range of mtry, tree and nodesize using the lower value of MSE-OOB mtry from 1 to 5 nodesize from1 to 10 tree from 1 to 500 using this paper as refery Palmer, D. S., O'Boyle, N. M., Glen, R. C., & Mitchell, J. B. O. (2007). Random Forest Models To Predict Aqueous Solubility. Journal of Chemical Information and Modeling, 47, 150-158. my problem is the following using data(airquality) : the tunning parameters with the lower value is:> print(result.mtry.df[result.mtry.df$RMSE == min(result.mtry.df$RMSE),]) *RMSE = 15.44751 MSE = 238.6257 mtry = 3 nodesize = 5 tree = 35* the numer of tree is very low, different respect how i can read in several pubblications And the second value lower is a tunning parameters with *tree = 1* print(head(result.mtry.df[with(result.mtry.df, order(MSE)), ])) RMSE MSE mtry nodesize tree 12035 15.44751 238.6257 3 5 35 *18001 15.44861 238.6595 4 7 1 *7018 16.02354 256.7539 2 5 18 20031 16.02536 256.8121 5 1 31 11037 16.02862 256.9165 3 3 37 11612 16.05162 257.6544 3 4 112 i am wondering if i wrong in the setting or there are some aspects i don't conseder. thanks for attention and thanks in advance for suggestions and help Gianni require(randomForest) data(airquality) set.seed(131) MyOzone <- data.frame(na.omit(airquality)) str(MyOzone) #all data My.mtry=c(1,2,3,4,5) My.nodesize=c(seq(1,10,by=1)) My.tree=c(seq(1,500,by=1)) # {} result.mtry <- list() for(i in 1:length(My.mtry)){ result.nodesize <- list() for(m in 1:length(My.nodesize)){ result.tree <- list() for(l in 1:length(My.tree)){ ozone.rf <- randomForest(MyOzone[,-c(1)],MyOzone[,c(1)], mtry=My.mtry[[i]], nodesize=My.nodesize[[m]], ntree=My.tree[[l]], importance=TRUE) result.tree[[l]] <- data.frame(RMSE=sqrt(ozone.rf$mse[ozone.rf$ntree]),MSE=ozone.rf$mse[ozone.rf$ntree],mtry=My.mtry[[i]],nodesize=My.nodesize[[m]],tree=My.tree[[l]]) } result.tree.df <- do.call(rbind,result.tree) result.nodesize[[m]] <- result.tree.df } result.nodesize.df <- do.call(rbind,result.nodesize) result.mtry[[i]] <- result.nodesize.df } result.mtry.df <- do.call(rbind,result.mtry) print(result.mtry.df[result.mtry.df$RMSE == min(result.mtry.df$RMSE),]) RMSE MSE mtry nodesize tree 12035 15.44751 238.6257 3 5 35 head(result.mtry.df[with(result.mtry.df, order(RMSE)), ]) RMSE MSE mtry nodesize tree 12035 15.44751 238.6257 3 5 35 18001 15.44861 238.6595 4 7 1 7018 16.02354 256.7539 2 5 18 20031 16.02536 256.8121 5 1 31 11037 16.02862 256.9165 3 3 37 11612 16.05162 257.6544 3 4 112 [[alternative HTML version deleted]]