thr3ads.net - R help - [R] randomForest question [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Arne.Muller at sanofi-aventis.com

2006-Jul-26 11:32 UTC

[R] randomForest question

Hello,

I've a question regarding randomForest (from the package with same name).
I've 16 featurs (nominative), 159 positive and 318 negative cases that
I'd like to classify (binary classification).

Using the tuning from the e1071 package it turns out that the best performance
if reached when using all 16 features per tree (mtry=16). However, the
documentation of randomForest suggests to take the sqrt(#features), i.e. 4. How
can I explain this difference? When using all features this is the same as a
classical decision tree, with the difference that the tree is built and tested
with different data sets, right?

example (I've tried different configurations, incl. changing
ntree):> param <- try(tune(randomForest, class ~ ., data=d.all318,
range=list(mtry=c(4, 8, 16), ntree=c(1000))));
>
> summary(param)
Parameter tuning of `randomForest':

- sampling method: 10-fold cross validation 

- best parameters:
 mtry ntree
   16  1000

- best performance: 0.1571809 

- Detailed performance results:
  mtry ntree     error
1    4  1000 0.1928635
2    8  1000 0.1634752
3   16  1000 0.1571809

	thanks a lot for your help,

	kind regards,

R help - Jul 2006 - randomForest question

[R] randomForest question