Iñaki Úcar
2018-Mar-26 21:46 UTC
[Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
Hi, I initially opened an issue in the R6 repo because my issue was with an R6 object. But Winston (thanks!) further simplified my example, and it turns out that the issue (whether a feature or a bug is yet to be seen) had to do with S3 dispatching. The following example, by Winston, depicts the issue: print.foo <- function(x, ...) { cat("print.foo called\n") invisible(x) } new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) class(e) <- "foo" e } new_foo() gc() # still in .Last.value gc() # nothing I would expect that the second call to gc() should free 'e', but it's not. However, if we call now *any* S3 method, then the object can be finally gc'ed: print(1) gc() # Finalizer called So the hypothesis is that there is some kind of caching (?) mechanism going on. Intended behaviour or not, this is something that was introduced between R 3.2.3 and 3.3.2 (the first succeeds; from the second on, the example fails as described above). Regards, I?aki PS: Further discussion and examples in https://github.com/r-lib/R6/issues/140
Winston Chang
2018-Mar-27 03:49 UTC
[Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
I'd like to emphasize that although I?aki's example uses print(), it also happens with other S3 generics. Please note that each of the following examples might need to be run in a clean R session to work. ==========Here's an example that doesn't use S3 dispatch. The finalizer runs correctly. ident <- function(x) invisible(x) env_with_finalizer <- function() { reg.finalizer(environment(), function(e) message("Finalizer called")) environment() } ident(env_with_finalizer()) gc() # Still in .Last.value gc() # Finalizer called ==========Here's an example that uses S3. In this case, the finalizer doesn't run. ident <- function(x) UseMethod("ident") ident.default <- function(x) invisible(x) env_with_finalizer <- function() { reg.finalizer(environment(), function(e) message("Finalizer called")) environment() } ident(env_with_finalizer()) gc() gc() # Nothing However, if the S3 generic is called with another object, the finalizer will run on the next GC: ident(1) gc() # Finalizer called ========== This example is the same as the previous one, except that, at the end, instead of calling the same S3 generic on a different object (that is, ident(1)), it calls a _different_ S3 generic on a different object (mean(1)). ident <- function(x) UseMethod("ident") ident.default <- function(x) invisible(x) env_with_finalizer <- function() { reg.finalizer(environment(), function(e) message("Finalizer called")) environment() } ident(env_with_finalizer()) gc() gc() # Nothing # Call a different S3 generic mean(1) gc() # Finalizer called -Winston On Mon, Mar 26, 2018 at 4:46 PM, I?aki ?car <i.ucar86 at gmail.com> wrote:> Hi, > > I initially opened an issue in the R6 repo because my issue was with > an R6 object. But Winston (thanks!) further simplified my example, and > it turns out that the issue (whether a feature or a bug is yet to be > seen) had to do with S3 dispatching. > > The following example, by Winston, depicts the issue: > > print.foo <- function(x, ...) { > cat("print.foo called\n") > invisible(x) > } > > new_foo <- function() { > e <- new.env() > reg.finalizer(e, function(e) message("Finalizer called")) > class(e) <- "foo" > e > } > > new_foo() > gc() # still in .Last.value > gc() # nothing > > I would expect that the second call to gc() should free 'e', but it's > not. However, if we call now *any* S3 method, then the object can be > finally gc'ed: > > print(1) > gc() # Finalizer called > > So the hypothesis is that there is some kind of caching (?) mechanism > going on. Intended behaviour or not, this is something that was > introduced between R 3.2.3 and 3.3.2 (the first succeeds; from the > second on, the example fails as described above). > > Regards, > I?aki > > PS: Further discussion and examples in https://github.com/r-lib/R6/issues/140 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
luke-tierney at uiowa.edu
2018-Mar-27 04:02 UTC
[Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
This has nothing to do with printing or dispatch per se. It is the result of an internal register (R_ReturnedValue) being protected. It gets rewritten whenever there is a jump, e.g. by an explicit return call. So a simplified example is new_foo <- function() { e <- new.env() reg.finalizer(e, function(e) message("Finalizer called")) e } bar <- function(x) return(x) bar(new_foo()) gc() # still in .Last.value gc() # nothing UseMethod essentially does a return call so you see the effect there. The R_ReturnedValue register could probably be safely cleared in more places but it isn't clear exactly where. As things stand it will be cleared on the next use of a non-local transfer of control, and those happen frequently enough that I'm not convinced this is worth addressing, at least not at this point in the release cycle. Best, luke On Mon, 26 Mar 2018, I?aki ?car wrote:> Hi, > > I initially opened an issue in the R6 repo because my issue was with > an R6 object. But Winston (thanks!) further simplified my example, and > it turns out that the issue (whether a feature or a bug is yet to be > seen) had to do with S3 dispatching. > > The following example, by Winston, depicts the issue: > > print.foo <- function(x, ...) { > cat("print.foo called\n") > invisible(x) > } > > new_foo <- function() { > e <- new.env() > reg.finalizer(e, function(e) message("Finalizer called")) > class(e) <- "foo" > e > } > > new_foo() > gc() # still in .Last.value > gc() # nothing > > I would expect that the second call to gc() should free 'e', but it's > not. However, if we call now *any* S3 method, then the object can be > finally gc'ed: > > print(1) > gc() # Finalizer called > > So the hypothesis is that there is some kind of caching (?) mechanism > going on. Intended behaviour or not, this is something that was > introduced between R 3.2.3 and 3.3.2 (the first succeeds; from the > second on, the example fails as described above). > > Regards, > I?aki > > PS: Further discussion and examples in https://github.com/r-lib/R6/issues/140 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Iñaki Úcar
2018-Mar-27 07:51 UTC
[Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
2018-03-27 6:02 GMT+02:00 <luke-tierney at uiowa.edu>:> This has nothing to do with printing or dispatch per se. It is the > result of an internal register (R_ReturnedValue) being protected. It > gets rewritten whenever there is a jump, e.g. by an explicit return > call. So a simplified example is > > new_foo <- function() { > e <- new.env() > reg.finalizer(e, function(e) message("Finalizer called")) > e > } > > bar <- function(x) return(x) > > bar(new_foo()) > gc() # still in .Last.value > gc() # nothing > > UseMethod essentially does a return call so you see the effect there.Understood. Thanks for the explanation, Luke.> The R_ReturnedValue register could probably be safely cleared in more > places but it isn't clear exactly where. As things stand it will be > cleared on the next use of a non-local transfer of control, and those > happen frequently enough that I'm not convinced this is worth > addressing, at least not at this point in the release cycle.I barely know the R internals, and I'm sure there's a good reason behind this change (R 3.2.3 does not show this behaviour), but IMHO it's, at the very least, confusing. When .Last.value is cleared, that object loses the last reference, and I'd expect it to be eligible for gc. In my case, I was using an object that internally generates a bunch of data. I discovered this because I was benchmarking the execution, and I was running out of memory because the memory wasn't been freed as it was supposed to. So I spent half of the day on this because I thought I had a memory leak. :-\ (Not blaming anyone here, of course; just making a case to show that this may be worth addressing at some point). :-) Regards, I?aki> > Best, > > luke >
luke-tierney at uiowa.edu
2018-Mar-27 13:22 UTC
[Rd] Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
I have committed a change to R-devel that addresses this. To be on the safe side I need to run some more extensive tests before deciding if this can be ported to the release branch for R 3.5.0. Should know in a day or two. Best, luke On Tue, 27 Mar 2018, luke-tierney at uiowa.edu wrote:> This has nothing to do with printing or dispatch per se. It is the > result of an internal register (R_ReturnedValue) being protected. It > gets rewritten whenever there is a jump, e.g. by an explicit return > call. So a simplified example is > > new_foo <- function() { > e <- new.env() > reg.finalizer(e, function(e) message("Finalizer called")) > e > } > > bar <- function(x) return(x) > > bar(new_foo()) > gc() # still in .Last.value > gc() # nothing > > UseMethod essentially does a return call so you see the effect there. > > The R_ReturnedValue register could probably be safely cleared in more > places but it isn't clear exactly where. As things stand it will be > cleared on the next use of a non-local transfer of control, and those > happen frequently enough that I'm not convinced this is worth > addressing, at least not at this point in the release cycle. > > Best, > > luke > > On Mon, 26 Mar 2018, I?aki ?car wrote: > >> Hi, >> >> I initially opened an issue in the R6 repo because my issue was with >> an R6 object. But Winston (thanks!) further simplified my example, and >> it turns out that the issue (whether a feature or a bug is yet to be >> seen) had to do with S3 dispatching. >> >> The following example, by Winston, depicts the issue: >> >> print.foo <- function(x, ...) { >> cat("print.foo called\n") >> invisible(x) >> } >> >> new_foo <- function() { >> e <- new.env() >> reg.finalizer(e, function(e) message("Finalizer called")) >> class(e) <- "foo" >> e >> } >> >> new_foo() >> gc() # still in .Last.value >> gc() # nothing >> >> I would expect that the second call to gc() should free 'e', but it's >> not. However, if we call now *any* S3 method, then the object can be >> finally gc'ed: >> >> print(1) >> gc() # Finalizer called >> >> So the hypothesis is that there is some kind of caching (?) mechanism >> going on. Intended behaviour or not, this is something that was >> introduced between R 3.2.3 and 3.3.2 (the first succeeds; from the >> second on, the example fails as described above). >> >> Regards, >> I?aki >> >> PS: Further discussion and examples in >> https://github.com/r-lib/R6/issues/140 >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Possibly Parallel Threads
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism
- Objects not gc'ed due to caching (?) in R's S3 dispatch mechanism