I hope this is merely a FAQ, and not an AFAQ (annoyingly....). I'm a SAS programmer, with several years' experience of the system, evaluating alternatives. See the SAS for Linux website (URL in sig) for more info. I'm exploring R's capabilities and limitations. I'd be very interested in having a deeper understanding of it capacity and performance limitations in dealing with very large datasets, which I would classify as tables with 1 million to 100s of millions of rows and two - 100+ fields (variables) generally of 8 bytes -- call it a 16 - 800 byte record length. Can R handle such large datasets (tables)? What are the general parameters for memory requirements? How great a performance hit does running to swap (virtual memory) entail? What common procedures|functions under R use significantly more memory? Are there guidelines or documentation which point to issues and parameters of large file|dataset processing under R? TIA. -- Karsten M. Self (kmself at ix.netcom.com) What part of "Gestalt" don't you understand? SAS for Linux: http://www.netcom.com/~kmself/SAS/SAS4Linux.html Mailing list: "subscribe sas-linux" to mailto:majordomo at cranfield.ac.uk 11:45pm up 70 days, 51 min, 1 user, load average: 0.67, 0.38, 0.21 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, Aug 03, 1999 at 06:57:38AM +0000, Karsten M. Self wrote:> I hope this is merely a FAQ, and not an AFAQ (annoyingly....). > > I'm a SAS programmer, with several years' experience of the system, > evaluating alternatives. See the SAS for Linux website (URL in sig) for > more info. > > I'm exploring R's capabilities and limitations. I'd be very interested > in having a deeper understanding of it capacity and performance > limitations in dealing with very large datasets, which I would classify > as tables with 1 million to 100s of millions of rows and two - 100+ > fields (variables) generally of 8 bytes -- call it a 16 - 800 byte > record length. > > Can R handle such large datasets (tables)? What are the general > parameters for memory requirements? How great a performance hit does > running to swap (virtual memory) entail? What common > procedures|functions under R use significantly more memory? Are there > guidelines or documentation which point to issues and parameters of > large file|dataset processing under R?R is not intended for data sets of the size you describe. It is indended to handle data sets of a few tens of megabytes at most. Unlike sas it holds complete data sets in memory. We are currently looking at the memory management and performance issues, but the scale of data processing you describe will need a different kind of tool. On the rare occasions I encounter problems of the size you describe I usually do significant preprocessing with something like perl. Given that the internals of R are not yet in a final state I would hestitate to make precise performance statements or recommendations. We will be in a better position to do so when the 1.0 release comes along. Ross -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Prof Brian D Ripley
1999-Aug-03 08:14 UTC
[R] Performance & capacity characteristics of R?
On Tue, 3 Aug 1999, Karsten M. Self wrote:> I hope this is merely a FAQ, and not an AFAQ (annoyingly....). > > I'm a SAS programmer, with several years' experience of the system, > evaluating alternatives. See the SAS for Linux website (URL in sig) for > more info. > > I'm exploring R's capabilities and limitations. I'd be very interested > in having a deeper understanding of it capacity and performance > limitations in dealing with very large datasets, which I would classify > as tables with 1 million to 100s of millions of rows and two - 100+ > fields (variables) generally of 8 bytes -- call it a 16 - 800 byte > record length.Can you tell us what statistical procedures need 1 million to 100s of millions or rows (observations)? Some of us have doubted that there are even datasets of 100,000 examples that are homogeneous and for which a small subsample would not give all the statistical information. (If they are not homogeneous, one could/should analyse homogeneous subsets and do a meta-analysis.) Your datasets appear to be (taking a mid-range value) around 1Gbyte in size.> Can R handle such large datasets (tables)? What are the generalR has a workspace size limit of 2048Mb, and on 32-bit machines this cannot be raised more than a tiny amount. I have only run R on a machine with 512Mb of RAM, and on that using objects of more than 100Mb or so slowed it down very considerably.> parameters for memory requirements? How great a performance hit does > running to swap (virtual memory) entail? What commonA large hit, as R's garbage collector moves objects in memory.> procedures|functions under R use significantly more memory? Are there > guidelines or documentation which point to issues and parameters of > large file|dataset processing under R?At its present stage of development, R is not tuned to work with such large datasets. There are plans to make it work better with them, but the issue remains as to whether there are many real applications that need such datasets. Hence my first question. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._