Deschamps, Benjamin
2010-Nov-09 15:52 UTC
[R] randomForest parameters for image classification
I am implementing an image classification algorithm using the randomForest package. The training data consists of 31000+ training cases over 26 variables, plus one factor predictor variable (the training class). The main issue I am encountering is very low overall classification accuracy (a lot of confusion between classes). However, I know from other classifications (including a regular decision tree classifier) that the training and validation data is sound and capable of producing good accuracies). Currently, I am using the default parameters (500 trees, mtry not set (default), nodesize = 1, replace=TRUE). Does anyone have experience using this with large datasets? Currently I need to randomly sample my training data because giving it the full 31000+ cases returns an out of memory error; the same thing happens with large numbers of trees. From what I read in the documentation, perhaps I do not have enough trees to fully capture the training data? Any suggestions or ideas will be greatly appreciated. Benjamin [[alternative HTML version deleted]]
Please show us the code you used to run randomForest, the output, as well as what you get with other algorithms (on the same random subset for comparison). I have yet to see a dataset where randomForest does _far_ worse than other methods. Andy> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Deschamps, Benjamin > Sent: Tuesday, November 09, 2010 10:52 AM > To: r-help at r-project.org > Subject: [R] randomForest parameters for image classification > > I am implementing an image classification algorithm using the > randomForest package. The training data consists of 31000+ training > cases over 26 variables, plus one factor predictor variable (the > training class). The main issue I am encountering is very low overall > classification accuracy (a lot of confusion between classes). > However, I > know from other classifications (including a regular decision tree > classifier) that the training and validation data is sound and capable > of producing good accuracies). > > > > Currently, I am using the default parameters (500 trees, mtry not set > (default), nodesize = 1, replace=TRUE). Does anyone have experience > using this with large datasets? Currently I need to randomly sample my > training data because giving it the full 31000+ cases returns > an out of > memory error; the same thing happens with large numbers of > trees. From > what I read in the documentation, perhaps I do not have > enough trees to > fully capture the training data? > > > > Any suggestions or ideas will be greatly appreciated. > > > > Benjamin > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:11}}
Possibly Parallel Threads
- randomForest, 'No forest component...' error while calling Predict()
- tuning random forest. An unexpected result
- Need Help! Poor performance about randomForest for large data
- Random Forest, Giving More Importance to Some Data
- What is the default nPerm for regression in randomForest?