Liaw, Andy
2006-Jul-27 00:12 UTC
[R] memory problems when combining randomForests [Broadcast]
You need to give us more details, like how you call randomForest, versions of the package and R itself, etc. Also, see if this helps you: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html Andy From: Eleni Rapsomaniki> > Dear all, > > I am trying to train a randomForest using all my control data > (12,000 cases, ~ 20 explanatory variables, 2 classes). > Because of memory constraints, I have split my data into 7 > subsets and trained a randomForest for each, hoping that > using combine() afterwards would solve the memory issue. > Unfortunately, > combine() still runs out of memory. Is there anything else I > can do? (I am not using the formula version) > > Many Thanks > Eleni Rapsomaniki > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Eleni Rapsomaniki
2006-Jul-27 15:07 UTC
[R] memory problems when combining randomForests [Broadcast]
I'm using R (windows) version 2.1.1, randomForest version 4.15. I call randomForest like this: my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index], xtest=test.df[,-response_index], ytest=test.df[,response_index], importance=TRUE,proximity=FALSE, keep.forest=TRUE) (where train.df and test.df are my train and test data.frames and response_index is the column number specifiying the class) I then save each tree to a file so I can combine them all afterwards. There are no memory issues when keep.forest=FALSE. But I think that's the bit I need for future predictions (right?). I did check previous messages on memory issues, and thought that combining the trees afterwards would solve the problem. Since my cross-validation subsets give me a fairly stable error-rate, I suppose I could just use a randomForest trained on just a subset of my data. But would I not be "wasting" data this way? A bit off the subject, but should the order at which at rows (ie. sets of explanatory variables) are passed to the randomForest function affect the result? I have noticed that if I pick a random unordered sample from my control data for training the error rate is much lower than if I a take an ordered sample. This remains true for all my cross-validation results. I'm sorry for my many questions. Many Thanks Eleni Rapsomaniki
Liaw, Andy
2006-Jul-31 16:54 UTC
[R] memory problems when combining randomForests [Broadcast]
It's the 5th paper on his web page.
http://www-stat.stanford.edu/~jhf/ftp/isle.pdf
<http://www-stat.stanford.edu/~jhf/ftp/isle.pdf>
Cheers,
Andy
_____
From: Weiwei Shi [mailto:helprhelp@gmail.com]
Sent: Monday, July 31, 2006 11:38 AM
To: Eleni Rapsomaniki
Cc: Liaw, Andy; r-help@stat.math.ethz.ch
Subject: Re: [R] memory problems when combining randomForests [Broadcast]
Hi, Andy:
What's the Jerry Friedman's ISLE? I googled it and did not find the
paper on
it. Could you give me a link, please?
Thanks,
Weiwei
On 7/31/06, Eleni Rapsomaniki <e.rapsomaniki@mail.cryst.bbk.ac.uk
<mailto:e.rapsomaniki@mail.cryst.bbk.ac.uk> > wrote:
Hello
I've just realised attachments are not allowed, so the data for the example
in
my previous message is:
pos.df=read.table("
<http://www.savefile.com/projects3.php?fid=6240314&pid=847249&key=119090>
http://www.savefile.com/projects3.php?fid=6240314&pid=847249&key=119090",
header=T)
neg.df=read.table("
<http://fs07.savefile.com/download.php?pid=847249&fid=9829834&key=362779>
http://fs07.savefile.com/download.php?pid=847249&fid=9829834&key=362779"",
header=T)
And my last two questions (promise!):
The first is related to the order of columns (ie. explanatory variables). I
get
different order of importance for my variables depending on their order in
the
training data. Is there a parameter I could fiddle with (e.g. ntree) to get
a
more stable importance order?
And finally, since interactions are not implemented, is there another method
I
could use in R to find dependencies among categorical variables? (lm doesn't
accept categorical variables).
Many thanks
Eleni Rapsomaniki
Birkbeck College, UK
______________________________________________
R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.
--
Weiwei Shi, Ph.D
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
------------------------------------------------------------------------------
------------------------------------------------------------------------------
[[alternative HTML version deleted]]