Dimitri Liakhovitski
2011-May-20 13:33 UTC
[R] Memory capacity question for a specific situation
Hello! I am trying to figure out if my latest R for 64 bits on a 64-bit Windows 7 PC, RAM = 6 GB could read in a dataset with: ~64 million rows ~30 columns about half of which contain integers (between 1 and 3 digits) and half - numeric data (tens to thousands). Or is it too much data? And even if it could read it in - will there be any memory left to conduct, for example, cluster analysis on that data set... Thanks a lot! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com
On 20.05.2011 15:33, Dimitri Liakhovitski wrote:> Hello! > > I am trying to figure out if my latest R for 64 bits on a 64-bit > Windows 7 PC, RAM = 6 GB could read in a dataset with: > > ~64 million rows > ~30 columns about half of which contain integers (between 1 and 3 > digits) and half - numeric data (tens to thousands). > > Or is it too much data? > And even if it could read it in - will there be any memory left to > conduct, for example, cluster analysis on that data set...Let us ask R: > 64e6 * (15*8 + 15*4) [1] 1.152e+10 That means you will need roughly 12 GB to store the data in memory. To work with the data, you should have at least 3 times the amount of memory available. Hence a 32 GB machine is a minimal requirement if you cannot restrict yourself to less observations or variables. Uwe Ligges> > Thanks a lot! >
Dimitri Liakhovitski
2011-May-20 14:13 UTC
[R] Memory capacity question for a specific situation
Thanks a lot, Uwe! 2011/5/20 Uwe Ligges <ligges at statistik.tu-dortmund.de>:> > > On 20.05.2011 15:33, Dimitri Liakhovitski wrote: >> >> Hello! >> >> I am trying to figure out if my latest R for 64 bits on a 64-bit >> Windows 7 PC, RAM = 6 GB could read in a dataset with: >> >> ~64 million rows >> ~30 columns about half of which contain integers (between 1 and 3 >> digits) and half - numeric data (tens to thousands). >> >> Or is it too much data? >> And even if it could read it in - will there be any memory left to >> conduct, for example, cluster analysis on that data set... > > > Let us ask R: > >> 64e6 * (15*8 + 15*4) > [1] 1.152e+10 > > That means you will need roughly 12 GB to store the data in memory. To work > with the data, you should have at least 3 times the amount of memory > available. Hence a 32 GB machine is a minimal requirement if you cannot > restrict yourself to less observations or variables. > > Uwe Ligges > > >> >> Thanks a lot! >> >-- Dimitri Liakhovitski Ninah Consulting www.ninah.com