Iñaki Úcar
2018-Mar-27 07:51 UTC
[Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
2018-03-27 6:02 GMT+02:00 <luke-tierney at uiowa.edu>:> This has nothing to do with printing or dispatch per se. It is the > result of an internal register (R_ReturnedValue) being protected. It > gets rewritten whenever there is a jump, e.g. by an explicit return > call. So a simplified example is > > new_foo <- function() { > e <- new.env() > reg.finalizer(e, function(e) message("Finalizer called")) > e > } > > bar <- function(x) return(x) > > bar(new_foo()) > gc() # still in .Last.value > gc() # nothing > > UseMethod essentially does a return call so you see the effect there.Understood. Thanks for the explanation, Luke.> The R_ReturnedValue register could probably be safely cleared in more > places but it isn't clear exactly where. As things stand it will be > cleared on the next use of a non-local transfer of control, and those > happen frequently enough that I'm not convinced this is worth > addressing, at least not at this point in the release cycle.I barely know the R internals, and I'm sure there's a good reason behind this change (R 3.2.3 does not show this behaviour), but IMHO it's, at the very least, confusing. When .Last.value is cleared, that object loses the last reference, and I'd expect it to be eligible for gc. In my case, I was using an object that internally generates a bunch of data. I discovered this because I was benchmarking the execution, and I was running out of memory because the memory wasn't been freed as it was supposed to. So I spent half of the day on this because I thought I had a memory leak. :-\ (Not blaming anyone here, of course; just making a case to show that this may be worth addressing at some point). :-) Regards, I?aki> > Best, > > luke >
Tomas Kalibera
2018-Mar-27 09:11 UTC
[Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
On 03/27/2018 09:51 AM, I?aki ?car wrote:> 2018-03-27 6:02 GMT+02:00 <luke-tierney at uiowa.edu>: >> This has nothing to do with printing or dispatch per se. It is the >> result of an internal register (R_ReturnedValue) being protected. It >> gets rewritten whenever there is a jump, e.g. by an explicit return >> call. So a simplified example is >> >> new_foo <- function() { >> e <- new.env() >> reg.finalizer(e, function(e) message("Finalizer called")) >> e >> } >> >> bar <- function(x) return(x) >> >> bar(new_foo()) >> gc() # still in .Last.value >> gc() # nothing >> >> UseMethod essentially does a return call so you see the effect there. > Understood. Thanks for the explanation, Luke. > >> The R_ReturnedValue register could probably be safely cleared in more >> places but it isn't clear exactly where. As things stand it will be >> cleared on the next use of a non-local transfer of control, and those >> happen frequently enough that I'm not convinced this is worth >> addressing, at least not at this point in the release cycle. > I barely know the R internals, and I'm sure there's a good reason > behind this change (R 3.2.3 does not show this behaviour), but IMHO > it's, at the very least, confusing. When .Last.value is cleared, that > object loses the last reference, and I'd expect it to be eligible for > gc. > > In my case, I was using an object that internally generates a bunch of > data. I discovered this because I was benchmarking the execution, and > I was running out of memory because the memory wasn't been freed as it > was supposed to. So I spent half of the day on this because I thought > I had a memory leak. :-\ (Not blaming anyone here, of course; just > making a case to show that this may be worth addressing at some > point). :-)From the perspective of the R user/programmer/package developer, please do not make any assumptions on when finalizers will be run, only that they indeed won't be run when the object is still alive. Similarly, it is not good to make any assumptions that "gc()" will actually run a collection (and a particular type of collection, that it will be immediately, etc). Such guarantees would too much restrict the design space and potential optimizations on the R internals side - and for this reason are typically not given in other managed languages, either. I've seen R examples where most time had been wasted tracing live objects because explicit "gc()" had been run in a tight loop. Note in Java for instance, an explicit call to gc() had been eventually turned into a hint only. Once you start debugging when objects are collected, you are debugging R internals - and surprises/changes between svn versions/etc should be expected as well as changes in behavior caused very indirectly by code changes somewhere else. I work on R internals and spend most of my time debugging - that is unfortunately normal when you work on a language runtime. Indeed, the runtime should try not to keep references to objects for too long, but it remains to be seen whether and for what cost this could be fixed with R_ReturnedValue. Best Tomas> > Regards, > I?aki > >> Best, >> >> luke >> > ______________________________________________ > R-devel at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-devel
Iñaki Úcar
2018-Mar-27 09:53 UTC
[Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
2018-03-27 11:11 GMT+02:00 Tomas Kalibera <tomas.kalibera at gmail.com>:> On 03/27/2018 09:51 AM, I?aki ?car wrote: >> >> 2018-03-27 6:02 GMT+02:00 <luke-tierney at uiowa.edu>: >>> >>> This has nothing to do with printing or dispatch per se. It is the >>> result of an internal register (R_ReturnedValue) being protected. It >>> gets rewritten whenever there is a jump, e.g. by an explicit return >>> call. So a simplified example is >>> >>> new_foo <- function() { >>> e <- new.env() >>> reg.finalizer(e, function(e) message("Finalizer called")) >>> e >>> } >>> >>> bar <- function(x) return(x) >>> >>> bar(new_foo()) >>> gc() # still in .Last.value >>> gc() # nothing >>> >>> UseMethod essentially does a return call so you see the effect there. >> >> Understood. Thanks for the explanation, Luke. >> >>> The R_ReturnedValue register could probably be safely cleared in more >>> places but it isn't clear exactly where. As things stand it will be >>> cleared on the next use of a non-local transfer of control, and those >>> happen frequently enough that I'm not convinced this is worth >>> addressing, at least not at this point in the release cycle. >> >> I barely know the R internals, and I'm sure there's a good reason >> behind this change (R 3.2.3 does not show this behaviour), but IMHO >> it's, at the very least, confusing. When .Last.value is cleared, that >> object loses the last reference, and I'd expect it to be eligible for >> gc. >> >> In my case, I was using an object that internally generates a bunch of >> data. I discovered this because I was benchmarking the execution, and >> I was running out of memory because the memory wasn't been freed as it >> was supposed to. So I spent half of the day on this because I thought >> I had a memory leak. :-\ (Not blaming anyone here, of course; just >> making a case to show that this may be worth addressing at some >> point). :-) > > From the perspective of the R user/programmer/package developer, please do > not make any assumptions on when finalizers will be run, only that they > indeed won't be run when the object is still alive. Similarly, it is not > good to make any assumptions that "gc()" will actually run a collection (and > a particular type of collection, that it will be immediately, etc). Such > guarantees would too much restrict the design space and potential > optimizations on the R internals side - and for this reason are typically > not given in other managed languages, either. I've seen R examples where > most time had been wasted tracing live objects because explicit "gc()" had > been run in a tight loop. Note in Java for instance, an explicit call to > gc() had been eventually turned into a hint only. > > Once you start debugging when objects are collected, you are debugging R > internals - and surprises/changes between svn versions/etc should be > expected as well as changes in behavior caused very indirectly by code > changes somewhere else. I work on R internals and spend most of my time > debugging - that is unfortunately normal when you work on a language > runtime. Indeed, the runtime should try not to keep references to objects > for too long, but it remains to be seen whether and for what cost this could > be fixed with R_ReturnedValue.To be precise, I was not debugging *when* objects were collected, I was debugging *whether* objects were collected. And for that, I necessarily need some hint about the *when*. But I think that's another discussion. My point is that, as an R user and package developer, I expect consistency, and currently new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) return(x) bar(new_foo()) gc() # still in .Last.value gc() # nothing behaves differently than new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) x bar(new_foo()) gc() # still in .Last.value gc() # Finalizer called! And such a difference is not explained (AFAIK) in the documentation. At least the help page for 'return' does not make me think that I should not expect exactly the same behaviour if I write (or not) an explicit 'return'. Regards, I?aki> > Best > Tomas >
Possibly Parallel Threads
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism