Thomas Pujol
2008-Feb-05 14:17 UTC
[R] advice requested re: building "good" system (R, SQL db) for handling large datasets
R-community, Sometime during the next 12-months, I plan on configuring a new computer system on which I will primarily run "R" and a SQL database (Microsoft SQL Server, MySQL, Oracle, etc). My primary goal is to "optimize" the system for R, and for passing data to and from R and the database. I work with large datasets, and therefore I "think" one of my most important goals should be to maximize the amount of RAM that R can utilize effectively. I am seeking advice concerning the version of R, OS, processor, hard-drive/storage configuration, database, etc. that I should consider. (I am guessing that I should build a system with lots of RAM, and a Linux OS, but am seeking advice from the R community.) If I choose Linux, does it matter which version I use? Any opinion regarding implementing a commercially supported version from a vendor such as Red Hat, Sun, etc? Is any database particularly better at "exchanging" data with R? While cost is of course a consideration, it is probably a secondary consideration to overall performance, reliability, and ease of ongoing maintenance/support. Thanks! --------------------------------- [[alternative HTML version deleted]]
Richard Pearson
2008-Feb-06 12:25 UTC
[R] advice requested re: building "good" system (R, SQL db) for handling large datasets
Hi Thomas I'm certainly no expert but thought I'd reply as I'm likely to be in a similar position soon. With regards versions of R I think you should always have the latest release version. This will mean upgrading at least every 6 months, but this shouldn't be too much of a problem. With OSs, you need to be aware that there is an upper limit to the amount of RAM than be handled (2-4 GB) with many. I think if you plan to use more than 4GB RAM, you should definitely consider 64-bit linux. I have no information or opinions as to which flavour of linux. With databases, one issue that might be relevant is whether you want to store data in tables (e.g. one table to store one data.frame) that can subsequently be manipulated in the DB, or to store R objects as R objects (e.g. as BLOBs). My situation is likely to be the later case, and one of my concerns is that many DBs have an upper limit of 2GB on BLOBs, and I might potentially have objects that are larger than this. Finally, you might get more response on database issues from R-sig-db than R-help. Best wishes Richard. Thomas Pujol wrote:> R-community, > Sometime during the next 12-months, I plan on configuring a new computer system on which I will primarily run "R" and a SQL database (Microsoft SQL Server, MySQL, Oracle, etc). My primary goal is to "optimize" the system for R, and for passing data to and from R and the database. > > I work with large datasets, and therefore I "think" one of my most important goals should be to maximize the amount of RAM that R can utilize effectively. > > I am seeking advice concerning the version of R, OS, processor, hard-drive/storage configuration, database, etc. that I should consider. (I am guessing that I should build a system with lots of RAM, and a Linux OS, but am seeking advice from the R community.) If I choose Linux, does it matter which version I use? Any opinion regarding implementing a commercially supported version from a vendor such as Red Hat, Sun, etc? Is any database particularly better at "exchanging" data with R? > > While cost is of course a consideration, it is probably a secondary consideration to overall performance, reliability, and ease of ongoing maintenance/support. > > Thanks! > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >