iuke-tier@ey m@iii@g oii uiow@@edu
2021-Apr-07 13:28 UTC
[Rd] [External] memory consumption of nested (un)serialize of sys.frames()
On Wed, 7 Apr 2021, Andreas Kersting wrote:> Hi, > > please consider the following minimal reproducible example: > > Create a new R package which just contains the following two (exported) objects:I would not expect this behavior and I don't see it when I make such a package (in R 4.0.3 or R-devel on Ubuntu). You will need to provide a more complete reproducible example if you want help with what you are trying to do; also sessionInfo() would help. Best, luke> > > crash_dumps <- new.env() > > f <- function() { > x <- runif(1e5) > dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) > assign("last.dump", dump, crash_dumps) > } > > > WARNING: the following will probably eat all your RAM! > > Attach this package and run: > > for (i in 1:100) { > print(i) > f() > } > > You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes. > > I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks! > > Regards, > Andreas > > > Background: > In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory. > > Some more observations: > - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced. > - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!? > - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env. > - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround: > > crash_dumps <- list() > > f <- function() { > x <- runif(1e5) > dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) > crash_dumps[["last.dump"]] <- dump > assignInMyNamespace("crash_dumps", crash_dumps) > } > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Andreas Kersting
2021-Apr-07 14:06 UTC
[Rd] [External] memory consumption of nested (un)serialize of sys.frames()
Hi Luke, Please see https://github.com/akersting/dumpTest for the package. Here a session showing my issue:> library(dumpTest) > sessionInfo()R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dumpTest_0.1.0 loaded via a namespace (and not attached): [1] compiler_4.0.5> for (i in 1:100) {+ print(i) + print(system.time(f())) + } [1] 1 user system elapsed 0.028 0.004 0.034 [1] 2 user system elapsed 0.067 0.008 0.075 [1] 3 user system elapsed 0.176 0.000 0.176 [1] 4 user system elapsed 0.335 0.012 0.349 [1] 5 user system elapsed 0.745 0.023 0.770 [1] 6 user system elapsed 1.495 0.060 1.572 [1] 7 user system elapsed 2.902 0.136 3.040 [1] 8 user system elapsed 5.753 0.272 6.034 [1] 9 user system elapsed 11.807 0.708 12.597 [1] 10 ^C Timing stopped at: 6.638 0.549 7.214 I had to interrupt in iteration 10 because I was running low on RAM. Regards, Andreas 2021-04-07 15:28 GMT+02:00 luke-tierney at uiowa.edu:> On Wed, 7 Apr 2021, Andreas Kersting wrote: > >> Hi, >> >> please consider the following minimal reproducible example: >> >> Create a new R package which just contains the following two (exported) objects: > > I would not expect this behavior and I don't see it when I make such a > package (in R 4.0.3 or R-devel on Ubuntu). You will need to provide a > more complete reproducible example if you want help with what you are > trying to do; also sessionInfo() would help. > > Best, > > luke > >> >> >> crash_dumps <- new.env() >> >> f <- function() { >> x <- runif(1e5) >> dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) >> assign("last.dump", dump, crash_dumps) >> } >> >> >> WARNING: the following will probably eat all your RAM! >> >> Attach this package and run: >> >> for (i in 1:100) { >> print(i) >> f() >> } >> >> You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes. >> >> I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks! >> >> Regards, >> Andreas >> >> >> Background: >> In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory. >> >> Some more observations: >> - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced. >> - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!? >> - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env. >> - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround: >> >> crash_dumps <- list() >> >> f <- function() { >> x <- runif(1e5) >> dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) >> crash_dumps[["last.dump"]] <- dump >> assignInMyNamespace("crash_dumps", crash_dumps) >> } >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu >