Why am I getting the error "Error: cannot allocate vector of size 512000 Kb" on a machine with 6 GB of RAM? I'm playing with some large data sets within R and doing some simple statistics. The data sets have 10^6 and 10^7 rows of numbers. R reads in and performs summary() on the 10^6 set just fine. However, on the 10^7 set, R halts with the error. My hunch is that somewhere there's an setting to limit some memory size to 500 MB. What setting is that, can it be increased, and if so how? Googling for the error has produced lots of hits but none with answers, yet. Still browsing. Below is a transcript of the session. Thanks in advance for any pointers in the right direction. Regards, - Robert http://www.cwelug.org/downloads Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent ------- $ uname -sorv ; rpm -q R ; R --version Linux 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:08:39 EDT 2005 GNU/Linux R-2.3.0-2.fc4 R version 2.3.0 (2006-04-24) Copyright (C) 2006 R Development Core Team $ wc -l dataset.010MM.txt 10000000 dataset.010MM.txt $ head -3 dataset.010MM.txt 15623 3845 22309 $ wc -l dataset.100MM.txt 100000000 dataset.100MM.txt $ head -3 dataset.100MM.txt 15623 3845 22309 $ cat ex3.r options(width=1000) foo <- read.delim("dataset.010MM.txt") summary(foo) foo <- read.delim("dataset.100MM.txt") summary(foo) $ R < ex3.r R > foo <- read.delim("dataset.010MM.txt") R > summary(foo) X15623 Min. : 1 1st Qu.: 8152 Median :16459 Mean :16408 3rd Qu.:24618 Max. :32766 R > foo <- read.delim("dataset.100MM.txt") Error: cannot allocate vector of size 512000 Kb Execution halted $ free -m total used free shared buffers cached Mem: 6084 3233 2850 0 20 20 -/+ buffers/cache: 3193 2891 Swap: 2000 2000 0
Robert Citek wrote:> Why am I getting the error "Error: cannot allocate vector of size > 512000 Kb" on a machine with 6 GB of RAM?1. The message means that you cannot allocate *further* 512Mb of RAM right now for the next step, but not what is required nor what R is currently consuming. 2. This seems to be a 32-bit OS. It limits the maximal allocation for the *single* R process to < 4Gb (if all goes very well).> I'm playing with some large data sets within R and doing some simple > statistics. The data sets have 10^6 and 10^7 rows of numbers. R3. 10^7 rows is not large, if you have one column... 4. 10^7 needs 10 times what is needed for 10^6. Hence comparing 10^6 and 10^7 is quite a difference. Uwe Ligges> reads in and performs summary() on the 10^6 set just fine. However, > on the 10^7 set, R halts with the error. My hunch is that somewhere > there's an setting to limit some memory size to 500 MB. What setting > is that, can it be increased, and if so how? Googling for the error > has produced lots of hits but none with answers, yet. Still browsing. > > Below is a transcript of the session. > > Thanks in advance for any pointers in the right direction. > > Regards, > - Robert > http://www.cwelug.org/downloads > Help others get OpenSource software. Distribute FLOSS > for Windows, Linux, *BSD, and MacOS X with BitTorrent > > ------- > > $ uname -sorv ; rpm -q R ; R --version > Linux 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:08:39 EDT 2005 GNU/Linux > R-2.3.0-2.fc4 > R version 2.3.0 (2006-04-24) > Copyright (C) 2006 R Development Core Team > > $ wc -l dataset.010MM.txt > 10000000 dataset.010MM.txt > > $ head -3 dataset.010MM.txt > 15623 > 3845 > 22309 > > $ wc -l dataset.100MM.txt > 100000000 dataset.100MM.txt > > $ head -3 dataset.100MM.txt > 15623 > 3845 > 22309 > > $ cat ex3.r > options(width=1000) > foo <- read.delim("dataset.010MM.txt") > summary(foo) > foo <- read.delim("dataset.100MM.txt") > summary(foo) > > $ R < ex3.r > > R > foo <- read.delim("dataset.010MM.txt") > > R > summary(foo) > X15623 > Min. : 1 > 1st Qu.: 8152 > Median :16459 > Mean :16408 > 3rd Qu.:24618 > Max. :32766 > > R > foo <- read.delim("dataset.100MM.txt") > Error: cannot allocate vector of size 512000 Kb > Execution halted > > $ free -m > total used free shared buffers > cached > Mem: 6084 3233 2850 0 > 20 20 > -/+ buffers/cache: 3193 2891 > Swap: 2000 2000 0 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hello Robert, ?Memory and ?memory.size will be very useful to you in resolving this. Please also note that the R/Windows faq addresses these issues for a Windows installation. http://www.stats.ox.ac.uk/pub/R/rw-FAQ.html Due to this list and the above link, I've found success using --max-mem-size when invoking R. I'd start w/ --max-mem-size. Not sure what OS you are using, but Windows will be more restrictive on memory (depending on whether you're using a Server edition, etc. HTH, -jason ----- Original Message ----- From: "Robert Citek" <rwcitek at alum.calberkeley.org> To: <r-help at stat.math.ethz.ch> Sent: Friday, May 05, 2006 8:24 AM Subject: [R] large data set, error: cannot allocate vector> > Why am I getting the error "Error: cannot allocate vector of size > 512000 Kb" on a machine with 6 GB of RAM? > > I'm playing with some large data sets within R and doing some simple > statistics. The data sets have 10^6 and 10^7 rows of numbers. R > reads in and performs summary() on the 10^6 set just fine. However, > on the 10^7 set, R halts with the error. My hunch is that somewhere > there's an setting to limit some memory size to 500 MB. What setting > is that, can it be increased, and if so how? Googling for the error > has produced lots of hits but none with answers, yet. Still browsing. > > Below is a transcript of the session. > > Thanks in advance for any pointers in the right direction. > > Regards, > - Robert > http://www.cwelug.org/downloads > Help others get OpenSource software. Distribute FLOSS > for Windows, Linux, *BSD, and MacOS X with BitTorrent > > ------- > > $ uname -sorv ; rpm -q R ; R --version > Linux 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:08:39 EDT 2005 GNU/Linux > R-2.3.0-2.fc4 > R version 2.3.0 (2006-04-24) > Copyright (C) 2006 R Development Core Team > > $ wc -l dataset.010MM.txt > 10000000 dataset.010MM.txt > > $ head -3 dataset.010MM.txt > 15623 > 3845 > 22309 > > $ wc -l dataset.100MM.txt > 100000000 dataset.100MM.txt > > $ head -3 dataset.100MM.txt > 15623 > 3845 > 22309 > > $ cat ex3.r > options(width=1000) > foo <- read.delim("dataset.010MM.txt") > summary(foo) > foo <- read.delim("dataset.100MM.txt") > summary(foo) > > $ R < ex3.r > > R > foo <- read.delim("dataset.010MM.txt") > > R > summary(foo) > X15623 > Min. : 1 > 1st Qu.: 8152 > Median :16459 > Mean :16408 > 3rd Qu.:24618 > Max. :32766 > > R > foo <- read.delim("dataset.100MM.txt") > Error: cannot allocate vector of size 512000 Kb > Execution halted > > $ free -m > total used free shared buffers > cached > Mem: 6084 3233 2850 0 > 20 20 > -/+ buffers/cache: 3193 2891 > Swap: 2000 2000 0 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Oops. I was off by an order of magnitude. I meant 10^7 and 10^8 rows of data for the first and second data sets, respectively. On May 5, 2006, at 10:24 AM, Robert Citek wrote:> R > foo <- read.delim("dataset.010MM.txt") > > R > summary(foo) > X15623 > Min. : 1 > 1st Qu.: 8152 > Median :16459 > Mean :16408 > 3rd Qu.:24618 > Max. :32766Reloaded the 10MM set and ran an object.size: R > object.size(foo) [1] 440000376 So, 10 MM numbers in about 440 MB. (Are my units correct?) That would explain why 10 MM numbers does work while 100 MM numbers won't work (4 GB limit on 32-bit machine). If my units are correct, then each value would be taking up 4-bytes, which sounds right for a 4- byte word (8 bits/byte * 4-bytes = 32-bits.) From Googling the archives, the solution that I've seen for working with large data sets seems to be moving to a 64-bit architecture. Short of that, are there any other generic workarounds, perhaps using a RDBMS or a CRAN package that enables working with arbitrarily large data sets? Regards, - Robert http://www.cwelug.org/downloads Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent
On Fri, 5 May 2006, Robert Citek wrote:> > Why am I getting the error "Error: cannot allocate vector of size > 512000 Kb" on a machine with 6 GB of RAM? >In addition to Uwe's message it is worth pointing out that gc() reports the maximum memory that your program has used (the rightmost two columns). You will probably see that this is large. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
Robert, You are hitting a known problem with data.frame ... row names. The row names alone are the reason the data.frame takes 10 times more memory than the vector. Your 100MM integer vector takes 381MB when you scan() it right? [4bytes*10^8/1024^2] But when you try and create a data.frame instead, something like 3.8GB would be required by the data.frame. This is beyond the practical 32bit limit, and nothing you do with memory options will solve that. The good news is that Prof Ripley has fixed the problem with data.frame row names in the latest development version of R. You could try that, it should be much more efficient i.e. a data.frame with a single integer column length 100MM should have object.size 381MB, just like a vector. However, how many columns do you have to deal with? 3GB allows 100,000,000 x 7 columns of integer in memory. That doesn't leave any room for copies, or types greater in size than integer, so you are still limited. Above that, as others have suggested, you need to connect to an RDMS, or go 64-bit is much easier if that is possible for you. I'd be interested to hear how you get on. Regards, Mark> Message: 41 > Date: Tue, 9 May 2006 15:27:58 -0500 > From: Robert Citek <rwcitek@alum.calberkeley.org> > Subject: Re: [R] large data set, error: cannot allocate vector > To: r-help@stat.math.ethz.ch > Message-ID: > < 81C5EDB4-8398-4C94-9E68-55592FE6410D@alum.calberkeley.org> > Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > > > On May 9, 2006, at 1:32 PM, Jason Barnhart wrote: > > > 1) So the original problem remains unsolved? > > The question was answered but the problem remains unsolved. The > question was, why am I getting an error "cannot allocate vector" when > reading in a 100 MM integer list. The answer appears to be: > > 1) R loads the entire data set into RAM > 2) on a 32-bit system R max'es out at 3 GB > 3) loading 100 MM integer entries into a data.frame requires more > than 3 GB of RAM (5-10 GB based on projections from 10 MM entries) > > So, the new question is, how does one work around such limits? >[[alternative HTML version deleted]]