thr3ads.net - R help - [R] memory problems when combining randomForests [Broadcast] [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Liaw, Andy

2006-Jul-27 00:12 UTC

[R] memory problems when combining randomForests [Broadcast]

You need to give us more details, like how you call randomForest, versions
of the package and R itself, etc.  Also, see if this helps you:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html

Andy
 
From: Eleni Rapsomaniki> 
> Dear all,
> 
> I am trying to train a randomForest using all my control data 
> (12,000 cases, ~ 20 explanatory variables, 2 classes). 
> Because of memory constraints, I have split my data into 7 
> subsets and trained a randomForest for each, hoping that 
> using combine() afterwards would solve the memory issue. 
> Unfortunately,
> combine() still runs out of memory. Is there anything else I 
> can do? (I am not using the formula version)
> 
> Many Thanks
> Eleni Rapsomaniki
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>

Eleni Rapsomaniki

2006-Jul-27 15:07 UTC

head link

[R] memory problems when combining randomForests [Broadcast]

I'm using R (windows) version 2.1.1, randomForest version 4.15. 
I call randomForest like this:

my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index],
 xtest=test.df[,-response_index], ytest=test.df[,response_index],
 importance=TRUE,proximity=FALSE, keep.forest=TRUE)

 (where train.df and test.df are my train and test data.frames and
 response_index is the column number specifiying the class)

I then save each tree to a file so I can combine them all afterwards. There are
no memory issues when keep.forest=FALSE. But I think that's the bit I need
for
future predictions (right?). 

I did check previous messages on memory issues, and thought that
combining the trees afterwards would solve the problem. Since my
cross-validation subsets give me a fairly stable error-rate, I suppose I could
just use a randomForest trained on just a subset of my data. But would I not be
"wasting" data this way?

A bit off the subject, but should the order at which at rows (ie. sets of
explanatory variables) are passed to the randomForest function affect the
result? I have noticed that if I pick a random unordered sample from my control
data for training the error rate is much lower than if I a take an ordered
sample. This remains true for all my cross-validation results. 

I'm sorry for my many questions.
Many Thanks
Eleni Rapsomaniki

Liaw, Andy

2006-Jul-31 16:54 UTC

head link

[R] memory problems when combining randomForests [Broadcast]

It's the 5th paper on his web page.
http://www-stat.stanford.edu/~jhf/ftp/isle.pdf
<http://www-stat.stanford.edu/~jhf/ftp/isle.pdf> 
 
Cheers,
Andy


  _____  

From: Weiwei Shi [mailto:helprhelp@gmail.com] 
Sent: Monday, July 31, 2006 11:38 AM
To: Eleni Rapsomaniki
Cc: Liaw, Andy; r-help@stat.math.ethz.ch
Subject: Re: [R] memory problems when combining randomForests [Broadcast]


Hi, Andy:

What's the Jerry Friedman's ISLE? I googled it and did not find the
paper on
it. Could you give me a link, please?

Thanks,

Weiwei


On 7/31/06, Eleni Rapsomaniki <e.rapsomaniki@mail.cryst.bbk.ac.uk
<mailto:e.rapsomaniki@mail.cryst.bbk.ac.uk> > wrote: 

Hello

I've just realised attachments are not allowed, so the data for the example
in
my previous message is:

pos.df=read.table("
<http://www.savefile.com/projects3.php?fid=6240314&pid=847249&key=119090>
http://www.savefile.com/projects3.php?fid=6240314&pid=847249&key=119090",
header=T)

neg.df=read.table("
<http://fs07.savefile.com/download.php?pid=847249&fid=9829834&key=362779>
http://fs07.savefile.com/download.php?pid=847249&fid=9829834&key=362779"",
header=T)

And my last two questions (promise!):
The first is related to the order of columns (ie. explanatory variables). I
get 
different order of importance for my variables depending on their order in
the
training data. Is there a parameter I could fiddle with (e.g. ntree) to get
a
more stable importance order?

And finally, since interactions are not implemented, is there another method
I 
could use in R to find dependencies among categorical variables? (lm doesn't
accept categorical variables).

Many thanks
Eleni Rapsomaniki
Birkbeck College, UK

______________________________________________ 
R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch>  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help> 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html> 
and provide commented, minimal, self-contained, reproducible code.





-- 
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



------------------------------------------------------------------------------

------------------------------------------------------------------------------
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more maybe matching threads

R help - Jul 2006 - memory problems when combining randomForests [Broadcast]

[R] memory problems when combining randomForests [Broadcast]

[R] memory problems when combining randomForests [Broadcast]

[R] memory problems when combining randomForests [Broadcast]

Apparently Analagous Threads