Liaw, Andy
2006-Jul-27 00:12 UTC
[R] memory problems when combining randomForests [Broadcast]
You need to give us more details, like how you call randomForest, versions of the package and R itself, etc. Also, see if this helps you: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html Andy From: Eleni Rapsomaniki> > Dear all, > > I am trying to train a randomForest using all my control data > (12,000 cases, ~ 20 explanatory variables, 2 classes). > Because of memory constraints, I have split my data into 7 > subsets and trained a randomForest for each, hoping that > using combine() afterwards would solve the memory issue. > Unfortunately, > combine() still runs out of memory. Is there anything else I > can do? (I am not using the formula version) > > Many Thanks > Eleni Rapsomaniki > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Eleni Rapsomaniki
2006-Jul-27 15:07 UTC
[R] memory problems when combining randomForests [Broadcast]
I'm using R (windows) version 2.1.1, randomForest version 4.15. I call randomForest like this: my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index], xtest=test.df[,-response_index], ytest=test.df[,response_index], importance=TRUE,proximity=FALSE, keep.forest=TRUE) (where train.df and test.df are my train and test data.frames and response_index is the column number specifiying the class) I then save each tree to a file so I can combine them all afterwards. There are no memory issues when keep.forest=FALSE. But I think that's the bit I need for future predictions (right?). I did check previous messages on memory issues, and thought that combining the trees afterwards would solve the problem. Since my cross-validation subsets give me a fairly stable error-rate, I suppose I could just use a randomForest trained on just a subset of my data. But would I not be "wasting" data this way? A bit off the subject, but should the order at which at rows (ie. sets of explanatory variables) are passed to the randomForest function affect the result? I have noticed that if I pick a random unordered sample from my control data for training the error rate is much lower than if I a take an ordered sample. This remains true for all my cross-validation results. I'm sorry for my many questions. Many Thanks Eleni Rapsomaniki
Liaw, Andy
2006-Jul-31 16:54 UTC
[R] memory problems when combining randomForests [Broadcast]
It's the 5th paper on his web page. http://www-stat.stanford.edu/~jhf/ftp/isle.pdf <http://www-stat.stanford.edu/~jhf/ftp/isle.pdf> Cheers, Andy _____ From: Weiwei Shi [mailto:helprhelp@gmail.com] Sent: Monday, July 31, 2006 11:38 AM To: Eleni Rapsomaniki Cc: Liaw, Andy; r-help@stat.math.ethz.ch Subject: Re: [R] memory problems when combining randomForests [Broadcast] Hi, Andy: What's the Jerry Friedman's ISLE? I googled it and did not find the paper on it. Could you give me a link, please? Thanks, Weiwei On 7/31/06, Eleni Rapsomaniki <e.rapsomaniki@mail.cryst.bbk.ac.uk <mailto:e.rapsomaniki@mail.cryst.bbk.ac.uk> > wrote: Hello I've just realised attachments are not allowed, so the data for the example in my previous message is: pos.df=read.table(" <http://www.savefile.com/projects3.php?fid=6240314&pid=847249&key=119090> http://www.savefile.com/projects3.php?fid=6240314&pid=847249&key=119090", header=T) neg.df=read.table(" <http://fs07.savefile.com/download.php?pid=847249&fid=9829834&key=362779> http://fs07.savefile.com/download.php?pid=847249&fid=9829834&key=362779"", header=T) And my last two questions (promise!): The first is related to the order of columns (ie. explanatory variables). I get different order of importance for my variables depending on their order in the training data. Is there a parameter I could fiddle with (e.g. ntree) to get a more stable importance order? And finally, since interactions are not implemented, is there another method I could use in R to find dependencies among categorical variables? (lm doesn't accept categorical variables). Many thanks Eleni Rapsomaniki Birkbeck College, UK ______________________________________________ R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> mailing list https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ [[alternative HTML version deleted]]