thr3ads.net - R help - [R] randomForest question--problem with ntree [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Mary Putt

2009-Aug-13 21:11 UTC

[R] randomForest question--problem with ntree

Hi, 

I would like to use a random Forest model to get an idea about which variables
from a dataset may have some prognostic significance in a smallish study. The
default for the number of trees seems to be 500. I tried changing the default to
ntree=2000 or ntree=200 and the results appear identical. Have changed mtry from
mtry=5 to mtry=6 successfully. Have seen same problem on both a Windows machine
and our linux system running 2.8 and 2.9.

Sample code follws.

Thanks in advance for help. 

Mary

> m1<-as.formula(paste("as.factor(EAD)~",
paste(names(clin_b)[c(5,7,10:36 )], collapse="+")))
> m1as.factor(EAD) ~ R_AGE + R_BMI + ASCITES...1L. + EOTAXIN + GM.CSF + 
    IFNa + IL.10 + IL.12.p40.p70 + IL.13 + IL.15 + IL.17 + IL.2 + 
    IL.4 + IL.5 + IL.6 + IL.7 + IL.8 + IL1.RA + IL2.R + IP.10 + 
    MCP.1 + MIG + MIP.1a + MIP.1b + RANTES + TNFa + Male + diagnosis + 
    race> 
> 
> 
> 
> set.seed(12345)
> rF.bsl<-randomForest(m1, data=clin_b, na.action=na.omit, mtry=6,
n.tree=2000)
> rF.bsl$ntree
[1] 500> rF.bsl$mtry
[1] 6> print(rF.bsl)
Call:
 randomForest(formula = m1, data = clin_b, mtry = 6, n.tree = 2000,  na.action =
na.omit)
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 6

        OOB estimate of  error rate: 39.66%
Confusion matrix:
   0 1 class.error
0 27 7   0.2058824
1 16 8   0.6666667> 
> 
> set.seed(12345)
> rF.bsl<-randomForest(m1, data=clin_b, na.action=na.omit, mtry=6,
n.tree=100)
> rF.bsl$ntree
[1] 500> rF.bsl$mtry
[1] 6> print(rF.bsl)
Call:
 randomForest(formula = m1, data = clin_b, mtry = 6, n.tree = 100,     
na.action = na.omit)
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 6

        OOB estimate of  error rate: 39.66%
Confusion matrix:
   0 1 class.error
0 27 7   0.2058824
1 16 8   0.6666667> 
>

Michael Knudsen

2009-Aug-14 10:03 UTC

head link

[R] randomForest question--problem with ntree

On Thu, Aug 13, 2009 at 11:11 PM, Mary Putt<mputt at mail.med.upenn.edu>
wrote:

Hi Mary,
> I would like to use a random Forest model to get an idea about which
variables from a dataset may have some prognostic significance in a smallish
study. The default for the number of trees seems to be 500. I tried changing the
default to ntree=2000 or ntree=200 and the results appear identical. Have
changed mtry from mtry=5 to mtry=6 successfully. Have seen same problem on both
a Windows machine and our linux system running 2.8 and 2.9.
I don't think it's correct to call it a problem; it's more likely a
feature! Try to take a look a Breiman's paper (in the "Machine
Learning" journal), where he introduces random forests. I read it
recently, and somewhere he explicitly mentions that ntree often may be
set very low without lowering the performance.

The random forest algorithm is very robust and apparently 500 trees
are usually more than enough. Therefore you don't get better results
by using 2000 trees, and often it doesn't affect the performance if
you use fewer trees (e.g. 200).

Best,
Michael

-- 
Michael Knudsen
micknudsen at gmail.com
http://lifeofknudsen.blogspot.com/

Michael Knudsen

2009-Aug-14 11:59 UTC

head link

[R] randomForest question--problem with ntree

On Fri, Aug 14, 2009 at 1:43 PM, Mary Putt<mputt at mail.med.upenn.edu>
wrote:
> I'm not calling it a problem that the answer converges--i.e. that the
algorithm is stable. but if you look at the example even though I've asked
for 2000 or 200 tress, ntree=2000 or ntree=200, it still gives me 500 trees
according to the output and identical results when you set the seed before the
call. While results are expected to be similar they should not be identical if
the number of trees was actuallly changed.
Oops! You have written n.tree instead of ntree.

Best,
Michael

-- 
Michael Knudsen
micknudsen at gmail.com
http://lifeofknudsen.blogspot.com/

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Aug 2009 - randomForest question--problem with ntree

[R] randomForest question--problem with ntree

[R] randomForest question--problem with ntree

[R] randomForest question--problem with ntree

Possibly Parallel Threads