tplate@blackmesacapital.com
2004-Jun-07 18:59 UTC
[Rd] strange apparently data-dependent crash with large data (PR#6955)
I'm consistently seeing R crash with a particular large data set. What's strange is that although the crash seems related to running out of memory, I'm unable to construct a pseudo-random data set of the same size that also causes the crash. Further adding to the strangeness is that the crash only happens if the dataset goes through a save()/load() cycle -- without that, the command in question just gives an out-of-memory error, but does not crash. To make this clear, three different versions of the same data consistently produce very different behavior: (1) original data read with read.table: memory error; fail to allocate 164062 Kb (2) original data through save()/load() cycle: memory error; fail to allocate 82031 Kb, followed by crash (3) psuedo-random data of same size and similar characteristics: works without problem This is with R-1.9.0 under Windows 2000. I'm not loading any optional packages. I get the same crash behavior with R-1.9.0 patched, and R-2.0.0 alpha, but I didn't test success with the psuedo-random data under those programs. (In case it matters, I got R-1.9.0 patched and R-2.0.0 alpha as pre-compiled Windows binaries from http://cran.us.r-project.org/ at 9:30am MDT on Jun 7, 2004.) Unfortunately, I don't have sufficient knowledge of how to debug memory problems in R to make further progress than I've made here, but maybe the following will provide some clues for someone else. All the following transcripts are from Rgui.exe, with new runs at each comment beginning with "###" ### Read in the data and get a out-of-memory error (but no crash) > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip > X <- read.table("ClassifyTrain.txt", skip=2) > X1 <- as.matrix(X) > hist(log(X1[,-(1:2)]+1)) Error: cannot allocate vector of size 164062 Kb In addition: Warning message: Reached total allocation of 1024Mb: see help(memory.size) > ### Read in the data and save it as a .RData file for faster runs (I initially did this for speed, ### but this seems to be essential to causing the crash) > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip > X <- read.table("ClassifyTrain.txt", skip=2) > X1 <- as.matrix(X) > c(class(X1), storage.mode(X1), dim(X1)) [1] "matrix" "double" "30000" "702" > save(list="X1", file="X1.RData") ### Produce the crash > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 1 minor 9.0 year 2004 month 04 day 12 language R > > load("X1.RData") > c(class(X1), storage.mode(X1), dim(X1)) [1] "matrix" "double" "30000" "702" > # all of the following 3 command consistently cause a crash > hist(log(X1[,-(1:2)]+1)) > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5)) > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5), plot=F) Error: cannot allocate vector of size 82031 Kb In addition: Warning message: Reached total allocation of 1024Mb: see help(memory.size) [message that comes in a Windows dialog box after a wait of many seconds:] R Console: Rgui.exe - Application Error The exception unknown software exception (0xc00000fd) occured in the application at location 0x6b5b0a53 #### The following is a failed attempt to reproduce the crash with psuedo-random #### data, i.e., R functions correctly (even when X1 is in memory) > > # Look at some characteristics of the original data in > # order to produce a matrix of similar psuedo-random numbers. > load("X1.RData") > dim(X1) [1] 30000 702 > class(X1) [1] "matrix" > storage.mode(X1) [1] "double" > table(is.na(X1)) FALSE 21060000 > table(X1==0) FALSE TRUE 2284455 18775545 > exp(diff(log(table(X1==0)))) TRUE 8.218829 > table(X1>=0) TRUE 21060000 > range(X1) [1] 0 326022 > memory.limit() [1] 1073741824 > memory.limit()/2^20 [1] 1024 > object.size(X1)/2^20 [1] 161.0267 > > set.seed(1) > X <- matrix(rexp(30000 * 702, 5e-5) * rbinom(30000 * 702, 1, 1/8), ncol=702) > range(X) [1] 3.615044e-04 3.249415e+05 > > # Both of thse commands seem to work without problems > hist(log(X[,-(1:2)]+1)) > hist(log(X[,-(1:2)]+1), breaks=seq(0,13,0.5))
Prof Brian Ripley
2004-Jun-07 19:40 UTC
(PR#6955) Re: [Rd] strange apparently data-dependent crash with large data
It is not very surprising that the R process might crash once the maximum memory limit is reached. View anything done in a session after that as suspect. (The Unix equivalent is often to crash without even telling you that you are out of memory.) On Mon, 7 Jun 2004 tplate@blackmesacapital.com wrote:> I'm consistently seeing R crash with a particular large data set. What's > strange is that although the crash seems related to running out of memory, > I'm unable to construct a pseudo-random data set of the same size that also > causes the crash. Further adding to the strangeness is that the crash only > happens if the dataset goes through a save()/load() cycle -- without that, > the command in question just gives an out-of-memory error, but does not crash. > > To make this clear, three different versions of the same data consistently > produce very different behavior: > > (1) original data read with read.table: memory error; fail to allocate > 164062 Kb > (2) original data through save()/load() cycle: memory error; fail to > allocate 82031 Kb, followed by crash > (3) psuedo-random data of same size and similar characteristics: works > without problem > > This is with R-1.9.0 under Windows 2000. I'm not loading any optional > packages. I get the same crash behavior with R-1.9.0 patched, and R-2.0.0 > alpha, but I didn't test success with the psuedo-random data under those > programs. (In case it matters, I got R-1.9.0 patched and R-2.0.0 alpha as > pre-compiled Windows binaries from http://cran.us.r-project.org/ at 9:30am > MDT on Jun 7, 2004.) Unfortunately, I don't have sufficient knowledge of > how to debug memory problems in R to make further progress than I've made > here, but maybe the following will provide some clues for someone else. > > All the following transcripts are from Rgui.exe, with new runs at each > comment beginning with "###" > > ### Read in the data and get a out-of-memory error (but no crash) > > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip > > X <- read.table("ClassifyTrain.txt", skip=2) > > X1 <- as.matrix(X) > > hist(log(X1[,-(1:2)]+1)) > Error: cannot allocate vector of size 164062 Kb > In addition: Warning message: > Reached total allocation of 1024Mb: see help(memory.size) > > > > ### Read in the data and save it as a .RData file for faster runs (I > initially did this for speed, > ### but this seems to be essential to causing the crash) > > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip > > X <- read.table("ClassifyTrain.txt", skip=2) > > X1 <- as.matrix(X) > > c(class(X1), storage.mode(X1), dim(X1)) > [1] "matrix" "double" "30000" "702" > > save(list="X1", file="X1.RData") > > ### Produce the crash > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 1 > minor 9.0 > year 2004 > month 04 > day 12 > language R > > > > load("X1.RData") > > c(class(X1), storage.mode(X1), dim(X1)) > [1] "matrix" "double" "30000" "702" > > # all of the following 3 command consistently cause a crash > > hist(log(X1[,-(1:2)]+1)) > > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5)) > > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5), plot=F) > Error: cannot allocate vector of size 82031 Kb > In addition: Warning message: > Reached total allocation of 1024Mb: see help(memory.size) > > [message that comes in a Windows dialog box after a wait of many seconds:] > > R Console: Rgui.exe - Application Error > The exception unknown software exception (0xc00000fd) occured in the > application at location 0x6b5b0a53 > > #### The following is a failed attempt to reproduce the crash with > psuedo-random > #### data, i.e., R functions correctly (even when X1 is in memory) > > > > # Look at some characteristics of the original data in > > # order to produce a matrix of similar psuedo-random numbers. > > load("X1.RData") > > dim(X1) > [1] 30000 702 > > class(X1) > [1] "matrix" > > storage.mode(X1) > [1] "double" > > table(is.na(X1)) > > FALSE > 21060000 > > table(X1==0) > > FALSE TRUE > 2284455 18775545 > > exp(diff(log(table(X1==0)))) > TRUE > 8.218829 > > table(X1>=0) > > TRUE > 21060000 > > range(X1) > [1] 0 326022 > > memory.limit() > [1] 1073741824 > > memory.limit()/2^20 > [1] 1024 > > object.size(X1)/2^20 > [1] 161.0267 > > > > set.seed(1) > > X <- matrix(rexp(30000 * 702, 5e-5) * rbinom(30000 * 702, 1, 1/8), ncol=702) > > range(X) > [1] 3.615044e-04 3.249415e+05 > > > > # Both of thse commands seem to work without problems > > hist(log(X[,-(1:2)]+1)) > > hist(log(X[,-(1:2)]+1), breaks=seq(0,13,0.5)) > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Duncan Murdoch
2004-Jun-07 20:14 UTC
[Rd] strange apparently data-dependent crash with large data (PR#6955)
On Mon, 7 Jun 2004 18:59:27 +0200 (CEST), tplate@blackmesacapital.com wrote :>I'm consistently seeing R crash with a particular large data set. What's >strange is that although the crash seems related to running out of memory, >I'm unable to construct a pseudo-random data set of the same size that also >causes the crash. Further adding to the strangeness is that the crash only >happens if the dataset goes through a save()/load() cycle -- without that, >the command in question just gives an out-of-memory error, but does not crash.This kind of error is very difficult to debug. What's likely happening is that in one case you run out of memory at a place with a correct check, and in the other you are hitting some flaky code that assumes every memory allocation is guaranteed to succeed. You could install DrMinGW (which produces a stack dump when you crash), but it's not necessarily informative: often the crash occurs relatively distantly from the buggy code that caused it. The other problem with this kind of error is that it may well disappear if you run under a debugger, since that will make you run out of memory at a different spot, and it may not appear on a different machine. For example, I ran your examples and they all failed because R ran out of memory, but none crashed. Duncan Murdoch