thr3ads.net - R help - [R] randomForest parameters for image classification [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Deschamps, Benjamin

2010-Nov-09 15:52 UTC

[R] randomForest parameters for image classification

I am implementing an image classification algorithm using the
randomForest package. The training data consists of 31000+ training
cases over 26 variables, plus one factor predictor variable (the
training class). The main issue I am encountering is very low overall
classification accuracy (a lot of confusion between classes). However, I
know from other classifications (including a regular decision tree
classifier) that the training and validation data is sound and capable
of producing good accuracies). 

 

Currently, I am using the default parameters (500 trees, mtry not set
(default), nodesize = 1, replace=TRUE). Does anyone have experience
using this with large datasets? Currently I need to randomly sample my
training data because giving it the full 31000+ cases returns an out of
memory error; the same thing happens with large numbers of trees.  From
what I read in the documentation, perhaps I do not have enough trees to
fully capture the training data?

 

Any suggestions or ideas will be greatly appreciated.

 

Benjamin


	[[alternative HTML version deleted]]

Liaw, Andy

2010-Nov-11 12:02 UTC

head link

[R] randomForest parameters for image classification

Please show us the code you used to run randomForest, the output, as
well as what you get with other algorithms (on the same random subset
for comparison).  I have yet to see a dataset where randomForest does
_far_ worse than other methods.

Andy 
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Deschamps, Benjamin
> Sent: Tuesday, November 09, 2010 10:52 AM
> To: r-help at r-project.org
> Subject: [R] randomForest parameters for image classification
> 
> I am implementing an image classification algorithm using the
> randomForest package. The training data consists of 31000+ training
> cases over 26 variables, plus one factor predictor variable (the
> training class). The main issue I am encountering is very low overall
> classification accuracy (a lot of confusion between classes). 
> However, I
> know from other classifications (including a regular decision tree
> classifier) that the training and validation data is sound and capable
> of producing good accuracies). 
> 
>  
> 
> Currently, I am using the default parameters (500 trees, mtry not set
> (default), nodesize = 1, replace=TRUE). Does anyone have experience
> using this with large datasets? Currently I need to randomly sample my
> training data because giving it the full 31000+ cases returns 
> an out of
> memory error; the same thing happens with large numbers of 
> trees.  From
> what I read in the documentation, perhaps I do not have 
> enough trees to
> fully capture the training data?
> 
>  
> 
> Any suggestions or ideas will be greatly appreciated.
> 
>  
> 
> Benjamin
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> Notice:  This e-mail message, together with any attachme...{{dropped:11}}

Reasonably Related Threads

Search for more seemingly similar threads

R help - Nov 2010 - randomForest parameters for image classification

[R] randomForest parameters for image classification

[R] randomForest parameters for image classification

Reasonably Related Threads