I'm trying to import a table into R the file is about 700MB. Here's my first try:> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)Error: cannot allocate vector of size 15.6 Mb In addition: Warning messages: 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) Then I tried> memory.limit(size=4095)and got> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)Error: cannot allocate vector of size 11.3 Mb but no additional errors. Then optimistically to clear up the workspace:> rm() > DD<-read.table("01uklicsam-20070301.dat",header=TRUE)Error: cannot allocate vector of size 15.6 Mb Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb? I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable memory is usually 2Gb. Surely they mean GB? The file I'm importing has about 3 million cases with 100 variables that I want to crosstabulate each with each. Is this completely unrealistic? Thanks! Maja -- View this message in context: http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26282348.html Sent from the R help mailing list archive at Nabble.com.
A little simple math. You have 3M rows with 100 items on each row. If read in this would be 300M items. If numeric, 8 bytes/item, this is 2.4GB. Given that you are probably using a 32 bit version of R, you are probably out of luck. A rule of thumb is that your largest object should consume at most 25% of your memory since you will probably be making copies as part of your processing. Given that, is you want to read in 100 variables at a time, I would say your limit would be about 500K rows to be reasonable. So you have a choice; read in fewer rolls, read in all 3M rows but at 20 columns per read, put the data in a database and extract what you need. Unless you go to a 64-bit version of R you will probably not be able to have the whole file in memory at one time. On Tue, Nov 10, 2009 at 7:10 AM, maiya <maja.zaloznik at gmail.com> wrote:> > I'm trying to import a table into R the file is about 700MB. Here's my first > try: > >> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) > > Error: cannot allocate vector of size 15.6 Mb > In addition: Warning messages: > 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: > ?Reached total allocation of 1535Mb: see help(memory.size) > 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: > ?Reached total allocation of 1535Mb: see help(memory.size) > 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: > ?Reached total allocation of 1535Mb: see help(memory.size) > 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: > ?Reached total allocation of 1535Mb: see help(memory.size) > > Then I tried > >> memory.limit(size=4095) > ?and got > >> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) > Error: cannot allocate vector of size 11.3 Mb > > but no additional errors. Then optimistically to clear up the workspace: > >> rm() >> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) > Error: cannot allocate vector of size 15.6 Mb > > Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb? > I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable > memory is usually 2Gb. Surely they mean GB? > > The file I'm importing has about 3 million cases with 100 variables that I > want to crosstabulate each with each. Is this completely unrealistic? > > Thanks! > > Maja > -- > View this message in context: http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26282348.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
For me with ff - on a 3 GB notebook - 3e6x100 works out of the box even without compression: doubles consume 2.2 GB on disk, but the R process remains under 100MB, rest of RAM used by file-system-cache. If you are under windows, you can create the ffdf files in a compressed folder. For the random doubles this reduces size on disk to 230MB - which should even work on a 1GB notebook. BTW: the most compressed datatype (vmode) that can handle NAs is "logical": consumes 2bit per tri-bool. The nextmost compressed is "byte" covering c(NA, -127:127) and consuming its name on disk and in fs-cache. The code below should give an idea of how to do pairwise stats on columns where each pair fits easily into RAM. In the real world, you would not create the data but import it using read.csv.ffdf (expect that reading your file takes longer than reading/writing the ffdf). Regards Jens Oehlschl?gel library(ff) k <- 100 n <- 3e6 # creating a ffdf dataframe of the requires size l <- vector("list", k) for (i in 1:k) l[[i]] <- ff(vmode="double", length=n, update=FALSE) names(l) <- paste("c", 1:k, sep="") d <- do.call("ffdf", l) # writing 100 columns of 1e6 random data takes 90 sec system.time( for (i in 1:k){ cat(i, " ") print(system.time(d[,i] <- rnorm(n))["elapsed"]) } )["elapsed"] m <- matrix(as.double(NA), k, k) # pairwise correlating one column against all others takes ~ 17.5 sec # pairwise correlating all combinations takes 15 min system.time( for (i in 2:k){ cat(i, " ") print(system.time({ x <- d[[i]][] for (j in 1:(i-1)){ m[i,j] <- cor(x, d[[j]][]) } })["elapsed"]) } )["elapsed"] -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Hi, I'm responding to the question about storage error, trying to read a 3000000 x 100 dataset into a data.frame. I wonder whether you can read the data as strings. If the numbers are all one digit, each cell would require just 1 byte instead of 8. That makes 300MB instead of 2.4GB. You can run crosstabs on the character values just as easily as if they were numeric. If you need numeric values, convert them a few at a time using as.numeric(). Here's an example -- # Generate some data and write it to a text file v <- rnorm(5,0,0.7); C_xx <- diag(v^2)+v%o%v C_xx mu <- rep(5,5) X.dat <- data.frame(round(mvrnorm(250, mu, C_xx))) head(X.dat) write.table(X.dat, "X.dat") # Read the data using scan, convert it to a data.frame Xstr.dat <- matrix(scan("X.dat", what="character", skip=1), 250, byrow=TRUE) Xstr.dat <- as.data.frame(Xstr.dat[,2:6], stringsAsFactors=FALSE) head(Xstr.dat) # Run a crosstab attach(Xstr.dat) table(V1, V2) Probably you do not need the option "stringsAsFactors=FALSE". Without it, the strings are converted to factors. Probably that does not change the amount of storage required. Larry Hotchkiss ------------------------------------------------------------------------------------ Message: 6 Date: Tue, 10 Nov 2009 04:10:07 -0800 (PST) From: maiya <maja.zaloznik at gmail.com> Subject: [R] Error: cannot allocate vector of size... To: r-help at r-project.org Message-ID: <26282348.post at talk.nabble.com> Content-Type: text/plain; charset=us-ascii I'm trying to import a table into R the file is about 700MB. Here's my first try:> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)Error: cannot allocate vector of size 15.6 Mb In addition: Warning messages: 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) Then I tried> memory.limit(size=4095)and got> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)Error: cannot allocate vector of size 11.3 Mb but no additional errors. Then optimistically to clear up the workspace:> rm() > DD<-read.table("01uklicsam-20070301.dat",header=TRUE)Error: cannot allocate vector of size 15.6 Mb Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb? I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable memory is usually 2Gb. Surely they mean GB? The file I'm importing has about 3 million cases with 100 variables that I want to crosstabulate each with each. Is this completely unrealistic? Thanks! Maja --