Hi All, I am trying to assemble a system that will allow me to work with large datasets (45-50 million rows, 300-400 columns) possibly amounting to 10GB + in size. I am aware that R 64 bit implementations on Linux boxes are suitable for such an exercise but I am looking for configurations that R users out there may have used in creating a high-end R system. Due to a lot of apprehensions that SAS users have about R's data limitations, I want to demonstrate R's usability even with very large datasets as mentioned above. I would be glad to hear from users(share configurations and system specific information) who have desktops/servers on which they use R to crunch massive datasets. Any suggestions in expanding R's functionality in the face of gigabyte class datasets would be appreciated. Thanks Harsh Singhal Decision Systems, Mu Sigma Inc. Chicago, IL
Kingsford Jones
2009-Feb-16 20:44 UTC
[R] Ideal (possible) configuration for an exalted R system
Hi Harsh, The useR! 2008 site has useful information. E.g. talks by Graham Williams: http://www.statistik.uni-dortmund.de/useR-2008/slides/Williams.pdf Dirk Eddelbuettel http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf and others http://www.statistik.uni-dortmund.de/useR-2008/abstracts/AbstractsByTopic.html#High%20Performance%20Computing A few days ago I was googling to see what types of workstations are available these days. Here's some with up to 64gb ram: http://www.colfax-intl.com/jlrid/SpotLight.asp?IT=0&RID=80 Perhaps it won't be long before we see such memory in laptops: http://www.ubergizmo.com/15/archives/2009/01/samsung_opens_door_to_32gb_ram_stick.html Like you, I'd also be interested in hearing about configurations folks have used to work w/ large datasets. hth, Kingsford Jones On Mon, Feb 16, 2009 at 5:10 AM, Harsh <singhalblr at gmail.com> wrote:> Hi All, > I am trying to assemble a system that will allow me to work with large > datasets (45-50 million rows, 300-400 columns) possibly amounting to > 10GB + in size. > > I am aware that R 64 bit implementations on Linux boxes are suitable > for such an exercise but I am looking for configurations that R users > out there may have used in creating a high-end R system. > Due to a lot of apprehensions that SAS users have about R's data > limitations, I want to demonstrate R's usability even with very large > datasets as mentioned above. > I would be glad to hear from users(share configurations and system > specific information) who have desktops/servers on which they use R to > crunch massive datasets. > > > Any suggestions in expanding R's functionality in the face of gigabyte > class datasets would be appreciated. > > Thanks > Harsh Singhal > Decision Systems, > Mu Sigma Inc. > Chicago, IL > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >