R developers, I am trying to track down a memory leak in my R package. I have a complex object O which comprises a lot of closures and such. Among which, the object uses an environment E to perform computations and keep intermediate values in. When O closes/finishes with its task it nulls out its reference to E so that that intermediate data can be garbage collected; I've verified that it does null the reference. However, it seems there is another reference to E floating around. I can tell because I can ask O to put a large array in E, then tell O to close, which nulls the reference to E, but then if I serialize(O, ascii=TRUE) I can still see the array in the output. Dangling references to E could come from a closure created in E, or an unforced promise from a function call evaluated in E that created a closure I still have a reference to, or, ... my question is how do I locate the reference? Is there a way to scan the workspace for objects that refer to a given object? Or is there a tool that will unpack/explain serialize()'s .rds format in a more human-readable way so that I can tell where the reference to E occurs? Peter
On 23/05/2023 2:29 p.m., Peter Meilstrup wrote:> R developers, > > I am trying to track down a memory leak in my R package. > > I have a complex object O which comprises a lot of closures and such. > Among which, the object uses an environment E to perform computations > and keep intermediate values in. When O closes/finishes with its task > it nulls out its reference to E so that that intermediate data can be > garbage collected; I've verified that it does null the reference. > > However, it seems there is another reference to E floating around. I > can tell because I can ask O to put a large array in E, then tell O to > close, which nulls the reference to E, but then if I serialize(O, > ascii=TRUE) I can still see the array in the output. > > Dangling references to E could come from a closure created in E, or an > unforced promise from a function call evaluated in E that created a > closure I still have a reference to, or, ... my question is how do I > locate the reference? > > Is there a way to scan the workspace for objects that refer to a given object? > > Or is there a tool that will unpack/explain serialize()'s .rds format > in a more human-readable way so that I can tell where the reference to > E occurs?I don't know of such a tool. You can generate a lot of data about the internals of an object by using .Internal(inspect(O)) If your O is complex as you say, you probably won't want to read through all of that output, but you can save it to a file and search for ENVSXP for an environment, or if you have printed E and it shows up as something like <environment: 0x7fb3761d1d98> you can search for that address in the output, e.g. in an example I just ran, an environment and a closure that uses that environment were printed as @7fb3761d1d98 04 ENVSXP g1c0 [MARK,REF(4)] <0x7fb3761d1d98> and @7fb3762befd0 03 CLOSXP g1c0 [MARK,REF(2),ATT] FORMALS: @7fb3c00f7ee0 00 NILSXP g1c0 [MARK,REF(65535)] BODY: @7fb376e5bc88 14 REALSXP g1c1 [MARK,REF(2)] (len=1, tl=0) 123 CLOENV: @7fb3761d1d98 04 ENVSXP g1c0 [MARK,REF(4)] <0x7fb3761d1d98> The hard part might be to identify what you've found once you find it, because names of objects aren't printed with the object, they come later when the "names" attribute gets printed. So it might be easier to do it by trial and error: rm(O) and save the workspace. Does your test array get saved? If so, it's referenced from something outside of O. If not, remove the elements of O one by one, until saving it doesn't save the array. The last thing removed is a culprit. (There may be others...) Duncan Murdoch
On 5/23/23 20:29, Peter Meilstrup wrote:> R developers, > > I am trying to track down a memory leak in my R package. > > I have a complex object O which comprises a lot of closures and such. > Among which, the object uses an environment E to perform computations > and keep intermediate values in. When O closes/finishes with its task > it nulls out its reference to E so that that intermediate data can be > garbage collected; I've verified that it does null the reference. > > However, it seems there is another reference to E floating around. I > can tell because I can ask O to put a large array in E, then tell O to > close, which nulls the reference to E, but then if I serialize(O, > ascii=TRUE) I can still see the array in the output. > > Dangling references to E could come from a closure created in E, or an > unforced promise from a function call evaluated in E that created a > closure I still have a reference to, or, ... my question is how do I > locate the reference? > > Is there a way to scan the workspace for objects that refer to a given object? > > Or is there a tool that will unpack/explain serialize()'s .rds format > in a more human-readable way so that I can tell where the reference to > E occurs?Maybe you could re-use my script used for finding a particular string in the R heap (captured temporary installation directory name during staged installation). It should be fairly easy to modify it to look for a particular environment object. Best Tomas https://github.com/kalibera/rstagedinst/blob/master/sicheck.R https://blog.r-project.org/2019/02/14/staged-install/index.html> > Peter > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel