ivo welch
2007-Jun-26 14:58 UTC
[R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory
dear R experts: I am of course no R experts, but use it regularly. I thought I would share some experimentation with memory use. I run a linux machine with about 4GB of memory, and R 2.5.0. upon startup, gc() reports used (Mb) gc trigger (Mb) max used (Mb) Ncells 268755 14.4 407500 21.8 350000 18.7 Vcells 139137 1.1 786432 6.0 444750 3.4 This is my baseline. linux 'top' reports 48MB as baseline. This includes some of my own routines that are always loaded. Good.. Next, I created a s.csv file with 22 variables and 500,000 observations, taking up an uncompressed disk space of 115MB. The resulting object.size() after a read.csv() is 84,002,712 bytes (80MB).> s= read.csv("s.csv"); > object.size(s);[1] 84002712 here is where things get more interesting. after the read.csv() is finished, gc() reports used (Mb) gc trigger (Mb) max used (Mb) Ncells 270505 14.5 8349948 446.0 11268682 601.9 Vcells 10639515 81.2 34345544 262.1 42834692 326.9 I was a big surprised by this---R had 928MB intermittent memory in use. More interestingly, this is also similar to what linux 'top' reports as memory use of the R process (919MB, probably 1024 vs. 1000 B/MB), even after the read.csv() is finished and gc() has been run. Nothing seems to have been released back to the OS. Now,> rm(s) > gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 270541 14.5 6679958 356.8 11268755 601.9 Vcells 139481 1.1 27476536 209.7 42807620 326.6 linux 'top' now reports 650MB of memory use (though R itself uses only 15.6Mb). My guess is that It leaves the trigger memory of 567MB plus the base 48MB. There are two interesting observations for me here: first, to read a .csv file, I need to have at least 10-15 times as much memory as the file that I want to read---a lot more than the factor of 3-4 that I had expected. The moral is that IF R can read a .csv file, one need not worry too much about running into memory constraints lateron. {R Developers---reducing read.csv's memory requirement a little would be nice. of course, you have more than enough on your plate, already.} Second, memory is not returned fully to the OS. This is not necessarily a bad thing, but good to know. Hope this helps... Sincerely, /iaw
Prof Brian Ripley
2007-Jun-26 16:53 UTC
[R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory
The R Data Import/Export Manual points out several ways in which you can use read.csv more efficiently. On Tue, 26 Jun 2007, ivo welch wrote:> dear R experts: > > I am of course no R experts, but use it regularly. I thought I would > share some experimentation with memory use. I run a linux machine > with about 4GB of memory, and R 2.5.0. > > upon startup, gc() reports > > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 268755 14.4 407500 21.8 350000 18.7 > Vcells 139137 1.1 786432 6.0 444750 3.4 > > This is my baseline. linux 'top' reports 48MB as baseline. This > includes some of my own routines that are always loaded. Good.. > > > Next, I created a s.csv file with 22 variables and 500,000 > observations, taking up an uncompressed disk space of 115MB. The > resulting object.size() after a read.csv() is 84,002,712 bytes (80MB). > >> s= read.csv("s.csv"); >> object.size(s); > > [1] 84002712 > > > here is where things get more interesting. after the read.csv() is > finished, gc() reports > > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 270505 14.5 8349948 446.0 11268682 601.9 > Vcells 10639515 81.2 34345544 262.1 42834692 326.9 > > I was a big surprised by this---R had 928MB intermittent memory in > use. More interestingly, this is also similar to what linux 'top' > reports as memory use of the R process (919MB, probably 1024 vs. 1000 > B/MB), even after the read.csv() is finished and gc() has been run. > Nothing seems to have been released back to the OS. > > Now, > >> rm(s) >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 270541 14.5 6679958 356.8 11268755 601.9 > Vcells 139481 1.1 27476536 209.7 42807620 326.6 > > linux 'top' now reports 650MB of memory use (though R itself uses only > 15.6Mb). My guess is that It leaves the trigger memory of 567MB plus > the base 48MB. > > > There are two interesting observations for me here: first, to read a > .csv file, I need to have at least 10-15 times as much memory as the > file that I want to read---a lot more than the factor of 3-4 that I > had expected. The moral is that IF R can read a .csv file, one need > not worry too much about running into memory constraints lateron. {R > Developers---reducing read.csv's memory requirement a little would be > nice. of course, you have more than enough on your plate, already.} > > Second, memory is not returned fully to the OS. This is not > necessarily a bad thing, but good to know. > > Hope this helps... > > Sincerely, > > /iaw > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595