Dear All, For a data mining project, I am relying heavily on the RandomForest and Party packages. Due to the large size of the data set, I have often memory problems (in particular with the Party package; RandomForest seems to use less memory). I really have two questions at this point 1) Please see how I am using the Party and RandomForest packages. Any comment is welcome and useful. myparty <- cforest(SalePrice ~ ModelID+ ProductGroup+ ProductGroupDesc+MfgYear+saledate3+saleday+ salemonth, data = trainRF, control = cforest_unbiased(mtry = 3, ntree=300, trace=TRUE)) rf_model <- randomForest(SalePrice ~ ModelID+ ProductGroup+ ProductGroupDesc+MfgYear+saledate3+saleday+ salemonth, data = trainRF,na.action = na.omit, importance=TRUE, do.trace=100, mtry=3,ntree=300) 2) I have another question: sometimes R crashes after telling me that it is unable to allocate e.g. an array of 1.5 Gb. However, I have 4Gb of ram on my box, so...technically the memory is there, but is there a way to enable R to use more of it? Many thanks Lorenzo
Neither of your questions meets the Posting Guidelines (see footer of any email). 1) Not reproducible. [1] 2) Very operating-system specific and a FAQ. You have not indicated what your OS is (via sessionInfo), nor what reading you have done to address memory problems already (use a search engine... or begin with the FAQs in R help or on CRAN). [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Lorenzo Isella <lorenzo.isella at gmail.com> wrote:>Dear All, >For a data mining project, I am relying heavily on the RandomForest and > >Party packages. >Due to the large size of the data set, I have often memory problems (in > >particular with the Party package; RandomForest seems to use less >memory). >I really have two questions at this point >1) Please see how I am using the Party and RandomForest packages. Any >comment is welcome and useful. > > > >myparty <- cforest(SalePrice ~ ModelID+ > ProductGroup+ > ProductGroupDesc+MfgYear+saledate3+saleday+ > salemonth, > data = trainRF, >control = cforest_unbiased(mtry = 3, ntree=300, trace=TRUE)) > > > > >rf_model <- randomForest(SalePrice ~ ModelID+ > ProductGroup+ > ProductGroupDesc+MfgYear+saledate3+saleday+ > salemonth, > data = trainRF,na.action = na.omit, > importance=TRUE, do.trace=100, mtry=3,ntree=300) > >2) I have another question: sometimes R crashes after telling me that >it >is unable to allocate e.g. an array of 1.5 Gb. >However, I have 4Gb of ram on my box, so...technically the memory is >there, but is there a way to enable R to use more of it? > >Many thanks > >Lorenzo > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On Sun, 3 Feb 2013, Lorenzo Isella wrote:> Dear All, > For a data mining project, I am relying heavily on the RandomForest and Party > packages. > Due to the large size of the data set, I have often memory problems (in > particular with the Party package; RandomForest seems to use less memory). I > really have two questions at this point > 1) Please see how I am using the Party and RandomForest packages. Any comment > is welcome and useful. > > > > myparty <- cforest(SalePrice ~ ModelID+ > ProductGroup+ > ProductGroupDesc+MfgYear+saledate3+saleday+ > salemonth, > data = trainRF, > control = cforest_unbiased(mtry = 3, ntree=300, trace=TRUE)) > > > > > rf_model <- randomForest(SalePrice ~ ModelID+ > ProductGroup+ > ProductGroupDesc+MfgYear+saledate3+saleday+ > salemonth, > data = trainRF,na.action = na.omit, > importance=TRUE, do.trace=100, mtry=3,ntree=300) > > 2) I have another question: sometimes R crashes after telling me that it is > unable to allocate e.g. an array of 1.5 Gb.Do not use the word 'crash': see the posting guide. I suspect it gives you an error message.> However, I have 4Gb of ram on my box, so...technically the memory is there, > but is there a way to enable R to use more of it?Yes. I am surmising this is Windows but you have not told us so. See the rw-FAQ. The real answer is to run a 64-bit OS: your computer may have 4GB of RAM, but your OS has a 2GB address space which could be raised to 3GB.> > Many thanks > > Lorenzo >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dear Dennis and dear All, It was probably not my best post. I am running R on a Debian box (amd64 architecture) and that is why I was surprised to see memory issues when dealing with a vector larger than 1Gb. The memory is there, but probably it is not contiguous. I will investigate into the matter and post again (generating an artificial dataframe if needed). Many thanks Lorenzo On 4 February 2013 00:50, Dennis Murphy <djmuser at gmail.com> wrote:> Hi Lorenzo: > > On Sun, Feb 3, 2013 at 11:47 AM, Lorenzo Isella > <lorenzo.isella at gmail.com> wrote: >> Dear All, >> For a data mining project, I am relying heavily on the RandomForest and >> Party packages. >> Due to the large size of the data set, I have often memory problems (in >> particular with the Party package; RandomForest seems to use less memory). I >> really have two questions at this point >> 1) Please see how I am using the Party and RandomForest packages. Any >> comment is welcome and useful. > > As noted elsewhere, the example is not reproducible so I can't help you there. >> >> >> >> myparty <- cforest(SalePrice ~ ModelID+ >> ProductGroup+ >> ProductGroupDesc+MfgYear+saledate3+saleday+ >> salemonth, >> data = trainRF, >> control = cforest_unbiased(mtry = 3, ntree=300, trace=TRUE)) >> >> >> >> >> rf_model <- randomForest(SalePrice ~ ModelID+ >> ProductGroup+ >> ProductGroupDesc+MfgYear+saledate3+saleday+ >> salemonth, >> data = trainRF,na.action = na.omit, >> importance=TRUE, do.trace=100, mtry=3,ntree=300) >> >> 2) I have another question: sometimes R crashes after telling me that it is >> unable to allocate e.g. an array of 1.5 Gb. >> However, I have 4Gb of ram on my box, so...technically the memory is there, >> but is there a way to enable R to use more of it? > > 4Gb is not a lot of RAM for data mining projects. I have twice that > and run into memory limits on some fairly simple tasks (e.g., 2D > tables) in large simulations with 1M or 10M runs. Part of the problem > is that data is often copied, sometimes more than once. If you have a > 1Gb input data frame, three copies and you're out of space. Moreover, > copied objects need contiguous memory, and this becomes very difficult > to achieve with large objects and limited RAM. With 4Gb RAM, you need > to be more clever: > > * eliminate as many other processes that access RAM as possible (e.g., > no active browser) > * think of ways to process your data in chunks (which is harder to do > when the objective is model fitting) > * type ?"Memory-limits" (including the quotes) at the console for > explanations about memory limits and a few places to look for > potential solutions > * look into 'big data' packages like ff or bigmemory, among others > * if you're in an (American ?) academic institution, you can get a > free license for Revolution R, which is supposed to be better for big > data problems than vanilla R > > It's hard to be specific about potential solutions, but the above > should broaden your perspective on the big data problem and possible > avenues for solving it. > > Dennis >> >> Many thanks >> >> Lorenzo >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.