Arne Henningsen
2020-Dec-12 23:19 UTC
[Rd] R crashes when using huge data sets with character string variables
When working with a huge data set with character string variables, I experienced that various commands let R crash. When I run R in a Linux/bash console, R terminates with the message "Killed". When I use RStudio, I get the message "R Session Aborted. R encountered a fatal error. The session was terminated. Start New Session". If an object in the R workspace needs too much memory, I would expect that R would not crash but issue an error message "Error: cannot allocate vector of size ...". A minimal reproducible example (at least on my computer) is: nObs <- 1e9 date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) Is this a bug or a feature of R? Some information about my R version, OS, etc: R> sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8 [5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8 [7] LC_PAPER=en_DK.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 /Arne -- Arne Henningsen http://www.arne-henningsen.name
Ben Bolker
2020-Dec-12 23:33 UTC
[Rd] R crashes when using huge data sets with character string variables
On Windows you can use memory.limit. https://stackoverflow.com/questions/12582793/limiting-memory-usage-in-r-under-linux Not sure how much that helps. On 12/12/20 6:19 PM, Arne Henningsen wrote:> When working with a huge data set with character string variables, I > experienced that various commands let R crash. When I run R in a > Linux/bash console, R terminates with the message "Killed". When I use > RStudio, I get the message "R Session Aborted. R encountered a fatal > error. The session was terminated. Start New Session". If an object in > the R workspace needs too much memory, I would expect that R would not > crash but issue an error message "Error: cannot allocate vector of > size ...". A minimal reproducible example (at least on my computer) > is: > > nObs <- 1e9 > > date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, > 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) > > Is this a bug or a feature of R? > > Some information about my R version, OS, etc: > > R> sessionInfo() > R version 4.0.3 (2020-10-10) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 20.04.1 LTS > > Matrix products: default > BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 > > locale: > [1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8 > [5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8 > [7] LC_PAPER=en_DK.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_4.0.3 > > /Arne >
iuke-tier@ey m@iii@g oii uiow@@edu
2020-Dec-13 03:26 UTC
[Rd] [External] R crashes when using huge data sets with character string variables
If R is receiving a kill signal there is nothing it can do about it. I am guessing you are running into a memory over-commit issue in your OS. https://en.wikipedia.org/wiki/Memory_overcommitment https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/ If you have to run this close to your physical memory limits you might try using your shell's facility (ulimit for bash, limit for some others) to limit process memory/virtual memory use to your available physical memory. You can also try setting the R_MAX_VSIZE environment variable mentioned in ?Memory; that only affects the R heap, not malloc() done elsewhere. Best, luke On Sat, 12 Dec 2020, Arne Henningsen wrote:> When working with a huge data set with character string variables, I > experienced that various commands let R crash. When I run R in a > Linux/bash console, R terminates with the message "Killed". When I use > RStudio, I get the message "R Session Aborted. R encountered a fatal > error. The session was terminated. Start New Session". If an object in > the R workspace needs too much memory, I would expect that R would not > crash but issue an error message "Error: cannot allocate vector of > size ...". A minimal reproducible example (at least on my computer) > is: > > nObs <- 1e9 > > date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, > 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) > > Is this a bug or a feature of R? > > Some information about my R version, OS, etc: > > R> sessionInfo() > R version 4.0.3 (2020-10-10) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 20.04.1 LTS > > Matrix products: default > BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 > > locale: > [1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8 > [5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8 > [7] LC_PAPER=en_DK.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_4.0.3 > > /Arne > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu