Andreas Kersting
2021-Apr-07 14:06 UTC
[Rd] [External] memory consumption of nested (un)serialize of sys.frames()
Hi Luke, Please see https://github.com/akersting/dumpTest for the package. Here a session showing my issue:> library(dumpTest) > sessionInfo()R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dumpTest_0.1.0 loaded via a namespace (and not attached): [1] compiler_4.0.5> for (i in 1:100) {+ print(i) + print(system.time(f())) + } [1] 1 user system elapsed 0.028 0.004 0.034 [1] 2 user system elapsed 0.067 0.008 0.075 [1] 3 user system elapsed 0.176 0.000 0.176 [1] 4 user system elapsed 0.335 0.012 0.349 [1] 5 user system elapsed 0.745 0.023 0.770 [1] 6 user system elapsed 1.495 0.060 1.572 [1] 7 user system elapsed 2.902 0.136 3.040 [1] 8 user system elapsed 5.753 0.272 6.034 [1] 9 user system elapsed 11.807 0.708 12.597 [1] 10 ^C Timing stopped at: 6.638 0.549 7.214 I had to interrupt in iteration 10 because I was running low on RAM. Regards, Andreas 2021-04-07 15:28 GMT+02:00 luke-tierney at uiowa.edu:> On Wed, 7 Apr 2021, Andreas Kersting wrote: > >> Hi, >> >> please consider the following minimal reproducible example: >> >> Create a new R package which just contains the following two (exported) objects: > > I would not expect this behavior and I don't see it when I make such a > package (in R 4.0.3 or R-devel on Ubuntu). You will need to provide a > more complete reproducible example if you want help with what you are > trying to do; also sessionInfo() would help. > > Best, > > luke > >> >> >> crash_dumps <- new.env() >> >> f <- function() { >> x <- runif(1e5) >> dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) >> assign("last.dump", dump, crash_dumps) >> } >> >> >> WARNING: the following will probably eat all your RAM! >> >> Attach this package and run: >> >> for (i in 1:100) { >> print(i) >> f() >> } >> >> You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes. >> >> I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks! >> >> Regards, >> Andreas >> >> >> Background: >> In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory. >> >> Some more observations: >> - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced. >> - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!? >> - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env. >> - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround: >> >> crash_dumps <- list() >> >> f <- function() { >> x <- runif(1e5) >> dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) >> crash_dumps[["last.dump"]] <- dump >> assignInMyNamespace("crash_dumps", crash_dumps) >> } >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu >
Dirk Eddelbuettel
2021-Apr-07 14:20 UTC
[Rd] [External] memory consumption of nested (un)serialize of sys.frames()
On 7 April 2021 at 16:06, Andreas Kersting wrote: | Hi Luke, | | Please see https://github.com/akersting/dumpTest for the package. | | Here a session showing my issue: | | > library(dumpTest) | > sessionInfo() | R version 4.0.5 (2021-03-31) | Platform: x86_64-pc-linux-gnu (64-bit) | Running under: Debian GNU/Linux 10 (buster) | | Matrix products: default | BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 | | locale: | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 | [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C | [9] LC_ADDRESS=C LC_TELEPHONE=C | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C | | attached base packages: | [1] stats graphics grDevices utils datasets methods base | | other attached packages: | [1] dumpTest_0.1.0 | | loaded via a namespace (and not attached): | [1] compiler_4.0.5 | > for (i in 1:100) { | + print(i) | + print(system.time(f())) | + } | [1] 1 | user system elapsed | 0.028 0.004 0.034 | [1] 2 | user system elapsed | 0.067 0.008 0.075 | [1] 3 | user system elapsed | 0.176 0.000 0.176 | [1] 4 | user system elapsed | 0.335 0.012 0.349 | [1] 5 | user system elapsed | 0.745 0.023 0.770 | [1] 6 | user system elapsed | 1.495 0.060 1.572 | [1] 7 | user system elapsed | 2.902 0.136 3.040 | [1] 8 | user system elapsed | 5.753 0.272 6.034 | [1] 9 | user system elapsed | 11.807 0.708 12.597 | [1] 10 | ^C | Timing stopped at: 6.638 0.549 7.214 | | I had to interrupt in iteration 10 because I was running low on RAM. No issue here. Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build off my Debian package, hence instrumentation as in the Debian package. edd at rob:~$ installGithub.r akersting/dumpTest Using github PAT from envvar GITHUB_PAT Downloading GitHub repo akersting/dumpTest at HEAD ? checking for file ?/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION? ... ? preparing ?dumpTest?: ? checking DESCRIPTION meta-information ... ? checking for LF line-endings in source and make files and shell scripts ? checking for empty or unneeded directories ? building ?dumpTest_0.1.0.tar.gz? Installing package into ?/usr/local/lib/R/site-library? (as ?lib? is unspecified) * installing *source* package ?dumpTest? ... ** using staged installation ** R ** byte-compile and prepare package for lazy loading ** help No man pages found in package ?dumpTest? *** installing help indices ** building package indices ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (dumpTest) edd at rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})' user system elapsed 0.481 0.019 0.500 edd at rob:~$ (I also ran the variant you showed with the dual print statements, it just consumes more screen real estate and ends on [...] [1] 97 user system elapsed 0.004 0.000 0.005 [1] 98 user system elapsed 0.004 0.000 0.005 [1] 99 user system elapsed 0.004 0.000 0.004 [1] 100 user system elapsed 0.005 0.000 0.005 edd at rob:~$ ) Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
iuke-tier@ey m@iii@g oii uiow@@edu
2021-Apr-07 14:39 UTC
[Rd] [External] memory consumption of nested (un)serialize of sys.frames()
No issues here with that either. Looks like something is different on your end. Best, luke On Wed, 7 Apr 2021, Andreas Kersting wrote:> Hi Luke, > > Please see https://github.com/akersting/dumpTest for the package. > > Here a session showing my issue: > >> library(dumpTest) >> sessionInfo() > R version 4.0.5 (2021-03-31) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Debian GNU/Linux 10 (buster) > > Matrix products: default > BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] dumpTest_0.1.0 > > loaded via a namespace (and not attached): > [1] compiler_4.0.5 >> for (i in 1:100) { > + print(i) > + print(system.time(f())) > + } > [1] 1 > user system elapsed > 0.028 0.004 0.034 > [1] 2 > user system elapsed > 0.067 0.008 0.075 > [1] 3 > user system elapsed > 0.176 0.000 0.176 > [1] 4 > user system elapsed > 0.335 0.012 0.349 > [1] 5 > user system elapsed > 0.745 0.023 0.770 > [1] 6 > user system elapsed > 1.495 0.060 1.572 > [1] 7 > user system elapsed > 2.902 0.136 3.040 > [1] 8 > user system elapsed > 5.753 0.272 6.034 > [1] 9 > user system elapsed > 11.807 0.708 12.597 > [1] 10 > ^C > Timing stopped at: 6.638 0.549 7.214 > > I had to interrupt in iteration 10 because I was running low on RAM. > > Regards, > Andreas > > 2021-04-07 15:28 GMT+02:00 luke-tierney at uiowa.edu: >> On Wed, 7 Apr 2021, Andreas Kersting wrote: >> >>> Hi, >>> >>> please consider the following minimal reproducible example: >>> >>> Create a new R package which just contains the following two (exported) objects: >> >> I would not expect this behavior and I don't see it when I make such a >> package (in R 4.0.3 or R-devel on Ubuntu). You will need to provide a >> more complete reproducible example if you want help with what you are >> trying to do; also sessionInfo() would help. >> >> Best, >> >> luke >> >>> >>> >>> crash_dumps <- new.env() >>> >>> f <- function() { >>> x <- runif(1e5) >>> dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) >>> assign("last.dump", dump, crash_dumps) >>> } >>> >>> >>> WARNING: the following will probably eat all your RAM! >>> >>> Attach this package and run: >>> >>> for (i in 1:100) { >>> print(i) >>> f() >>> } >>> >>> You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes. >>> >>> I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks! >>> >>> Regards, >>> Andreas >>> >>> >>> Background: >>> In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory. >>> >>> Some more observations: >>> - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced. >>> - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!? >>> - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env. >>> - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround: >>> >>> crash_dumps <- list() >>> >>> f <- function() { >>> x <- runif(1e5) >>> dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) >>> crash_dumps[["last.dump"]] <- dump >>> assignInMyNamespace("crash_dumps", crash_dumps) >>> } >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> -- >> Luke Tierney >> Ralph E. Wareham Professor of Mathematical Sciences >> University of Iowa Phone: 319-335-3386 >> Department of Statistics and Fax: 319-335-3017 >> Actuarial Science >> 241 Schaeffer Hall email: luke-tierney at uiowa.edu >> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu >>-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu