Janko Thyson
2011-Aug-30 10:59 UTC
[R] Why does loading saved/cached objects add significantly to RAM consumption?
Dear list, I make use of cached objects extensively for time consuming computations and yesterday I happened to notice some very strange behavior in that respect: When I execute a given computation whose result I'd like to cache (tried both saving it as '.Rdata' and via package 'R.cache' which uses a own filetype '.Rcache'), my R session consumes about 200 MB of RAM, which is fine. Now, when I make use of the previously cached object (i.e. loading it, assigning it to a certain field of a Reference Class object), I noticed that RAM consumption of my R process jumps to about 250 MB! a Each new loading of cached/saved objects adds to that consumption (in total, I have about 5-8 objects that are processed this way), so at some point I easily get a RAM consumption of over 2 GB where I'm only at about 200 MB of consumption when I compute each object directly! Object sizes (checked with 'object.size()') remain fairly constant. What's even stranger: after loading cached objects and removing them (either via 'rm()' or by assigning a 'fresh' empty object to the respective Reference Class field, RAM consumption remains at this high level and never comes down again. I checked the behavior also in a small example which is a simplification of my use case and which you'll find below (checked both on Win XP and Win 7 32 bit). I couldn't quite reproduce an immediate increase in RAM consumption, but what I still find really strange is a) why do repeated 'load()' calls result in an increase in RAM consumption? b) why does the latter not go down again after the objects have been removed from '.GlobalEnv'? Did anyone of you experience a similar behavior? Or even better, does anyone know why this is happening and how it might be fixed (or be worked around)? ;-) I really need your help on this one as it's crucial for my thesis, thanks a lot for anyone replying!! Regards, Janko ##### EXAMPLE ##### setRefClass("A", fields=list(.PRIMARY="environment")) setRefClass("Test", fields=list(a="A")) obj.1 <- lapply(1:5000, function(x){ rnorm(x) }) names(obj.1) <- paste("sample", 1:5000, sep=".") obj.1 <- as.environment(obj.1) test <- new("Test", a=new("A", .PRIMARY=obj.1)) test$a$.PRIMARY$sample.10 #+++++ object.size(test) object.size(test$a) object.size(obj.1) # RAM used by R session: 118 MB save(obj.1, file="C:/obj.1.Rdata") # Results in an object of ca. 94 MB save(test, file="C:/test.Rdata") # Results in an object of ca. 94 MB ##### START A NEW R SESSION ##### load("C:/test.Rdata") # RAM consumption still fine at 115 - 118 MB # But watch how it goes up as we repeatedly load objects for(x in 1:5){ load("C:/test.Rdata") } for(x in 1:5){ load("C:/obj.1.Rdata") } # Somehow there seems to be an upper limit, though # Removing the objects does not bring down RAM consumption rm(obj.1) rm(test) ########## > Sys.info() sysname release "Windows" "XP" version nodename "build 2600, Service Pack 3" "ASHB-109C-02" machine login "x86" "wwa418" user "wwa418" > sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] codetools_0.2-8 tools_2.13.1
Henrik Bengtsson
2011-Aug-30 18:33 UTC
[R] Why does loading saved/cached objects add significantly to RAM consumption?
Hi. On Tue, Aug 30, 2011 at 3:59 AM, Janko Thyson <janko.thyson.rstuff at googlemail.com> wrote:> Dear list, > > I make use of cached objects extensively for time consuming computations and > yesterday I happened to notice some very strange behavior in that respect: > When I execute a given computation whose result I'd like to cache (tried > both saving it as '.Rdata' and via package 'R.cache' which uses a own > filetype '.Rcache'),Just to clarify, it is just the filename extension that is "custom"; it uses base::save() internally. It is very unlikely that R.cache has to do with your problem.> my R session consumes about 200 MB of RAM, which is > fine. Now, when I make use of the previously cached object (i.e. loading it, > assigning it to a certain field of a Reference Class object), I noticed that > RAM consumption of my R process jumps to about 250 MB! > a > Each new loading of cached/saved objects adds to that consumption (in total, > I have about 5-8 objects that are processed this way), so at some point I > easily get a RAM consumption of over 2 GB where I'm only at about 200 MB of > consumption when I compute each object directly! Object sizes (checked with > 'object.size()') remain fairly constant. What's even stranger: after loading > cached objects and removing them (either via 'rm()' or by assigning a > 'fresh' empty object to the respective Reference Class field, RAM > consumption remains at this high level and never comes down again. > > I checked the behavior also in a small example which is a simplification of > my use case and which you'll find below (checked both on Win XP and Win 7 32 > bit). I couldn't quite reproduce an immediate increase in RAM consumption,I couldn't reproduce it either using sessionInfo(): R version 2.13.1 Patched (2011-08-29 r56823) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.13.1> but what I still find really strange is > a) why do repeated 'load()' calls result in an increase in RAM consumption? > b) why does the latter not go down again after the objects have been removed > from '.GlobalEnv'?Removed objects may still sit in memory - it is only when R's garbage collector (GC) comes around and removes them that the memory usage goes down. You can force the garbage collector to run by calling gc(), but normally it is automatically triggered whenever needed. Note that the GC will only be able to clean up the memory of removed objects IFF there are no other references to that object/piece of memory. When you use References classes (cf. setRefClass()) and environments, you end up keeping references internally in objects without being aware of it. My guess is that your other code may have such issues, whereas the code below does not. There is also the concept of "promises" [see 'R Language Definition' document], which *may* also be involved. FYI, the Sysinternals Process Explorer [http://technet.microsoft.com/en-us/sysinternals/bb896653] is a useful tool for studying individual processes such as R. My $.02 Henrik> > Did anyone of you experience a similar behavior? Or even better, does anyone > know why this is happening and how it might be fixed (or be worked around)? > ;-) > > I really need your help on this one as it's crucial for my thesis, thanks a > lot for anyone replying!! > > Regards, > Janko > > ##### EXAMPLE ##### > > setRefClass("A", fields=list(.PRIMARY="environment")) > setRefClass("Test", fields=list(a="A")) > > obj.1 <- lapply(1:5000, function(x){ > ? ?rnorm(x) > }) > names(obj.1) <- paste("sample", 1:5000, sep=".") > obj.1 <- as.environment(obj.1) > > test <- new("Test", a=new("A", .PRIMARY=obj.1)) > test$a$.PRIMARY$sample.10 > > #+++++ > > object.size(test) > object.size(test$a) > object.size(obj.1) > # RAM used by R session: 118 MB > > save(obj.1, file="C:/obj.1.Rdata") > # Results in an object of ca. 94 MB > save(test, file="C:/test.Rdata") > # Results in an object of ca. 94 MB > > ##### START A NEW R SESSION ##### > > load("C:/test.Rdata") > # RAM consumption still fine at 115 - 118 MB > > # But watch how it goes up as we repeatedly load objects > for(x in 1:5){ > ? ?load("C:/test.Rdata") > } > for(x in 1:5){ > ? ?load("C:/obj.1.Rdata") > } > # Somehow there seems to be an upper limit, though > > # Removing the objects does not bring down RAM consumption > rm(obj.1) > rm(test) > > ########## > >> Sys.info() > ? ? ? ? ? ? ? ? ? ? sysname ? ? ? ? ? ? ? ? ? ? ?release > ? ? ? ? ? ? ? ? ? "Windows" ? ? ? ? ? ? ? ? ? ? ? ? "XP" > ? ? ? ? ? ? ? ? ? ? version ? ? ? ? ? ? ? ? ? ? nodename > "build 2600, Service Pack 3" ? ? ? ? ? ? ? "ASHB-109C-02" > ? ? ? ? ? ? ? ? ? ? machine ? ? ? ? ? ? ? ? ? ? ? ?login > ? ? ? ? ? ? ? ? ? ? ? "x86" ? ? ? ? ? ? ? ? ? ? "wwa418" > ? ? ? ? ? ? ? ? ? ? ? ?user > ? ? ? ? ? ? ? ? ? ?"wwa418" > >> sessionInfo() > R version 2.13.1 (2011-07-08) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=German_Germany.1252 ?LC_CTYPE=German_Germany.1252 > [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C > [5] LC_TIME=German_Germany.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > loaded via a namespace (and not attached): > [1] codetools_0.2-8 tools_2.13.1 > >
Possibly Parallel Threads
- WG: Reference classes: error with missing arguments in method calls
- Require of gWidgetsRGtk2 fails: RGtk2.dll can't be found, but it's there
- Possible bug in 'relist()' and/or 'as.relistable()'
- RCurl - HTTP request of header ONLY
- Error handling with frozen RCurl function calls + Identification of frozen R processes