I'm trying to import a table into R the file is about 700MB. Here's my first try:> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)Error: cannot allocate vector of size 15.6 Mb In addition: Warning messages: 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : Reached total allocation of 1535Mb: see help(memory.size) Then I tried> memory.limit(size=4095)and got> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)Error: cannot allocate vector of size 11.3 Mb but no additional errors. Then optimistically to clear up the workspace:> rm() > DD<-read.table("01uklicsam-20070301.dat",header=TRUE)Error: cannot allocate vector of size 15.6 Mb Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb? I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable memory is usually 2Gb. Surely they mean GB? The file I'm importing has about 3 million cases with 100 variables that I want to crosstabulate each with each. Is this completely unrealistic? Thanks! Maja -- View this message in context: http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26282348.html Sent from the R help mailing list archive at Nabble.com.
A little simple math. You have 3M rows with 100 items on each row. If read in this would be 300M items. If numeric, 8 bytes/item, this is 2.4GB. Given that you are probably using a 32 bit version of R, you are probably out of luck. A rule of thumb is that your largest object should consume at most 25% of your memory since you will probably be making copies as part of your processing. Given that, is you want to read in 100 variables at a time, I would say your limit would be about 500K rows to be reasonable. So you have a choice; read in fewer rolls, read in all 3M rows but at 20 columns per read, put the data in a database and extract what you need. Unless you go to a 64-bit version of R you will probably not be able to have the whole file in memory at one time. On Tue, Nov 10, 2009 at 7:10 AM, maiya <maja.zaloznik at gmail.com> wrote:> > I'm trying to import a table into R the file is about 700MB. Here's my first > try: > >> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) > > Error: cannot allocate vector of size 15.6 Mb > In addition: Warning messages: > 1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: > ?Reached total allocation of 1535Mb: see help(memory.size) > 2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: > ?Reached total allocation of 1535Mb: see help(memory.size) > 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: > ?Reached total allocation of 1535Mb: see help(memory.size) > 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, ?: > ?Reached total allocation of 1535Mb: see help(memory.size) > > Then I tried > >> memory.limit(size=4095) > ?and got > >> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) > Error: cannot allocate vector of size 11.3 Mb > > but no additional errors. Then optimistically to clear up the workspace: > >> rm() >> DD<-read.table("01uklicsam-20070301.dat",header=TRUE) > Error: cannot allocate vector of size 15.6 Mb > > Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb? > I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable > memory is usually 2Gb. Surely they mean GB? > > The file I'm importing has about 3 million cases with 100 variables that I > want to crosstabulate each with each. Is this completely unrealistic? > > Thanks! > > Maja > -- > View this message in context: http://old.nabble.com/Error%3A-cannot-allocate-vector-of-size...-tp26282348p26282348.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
For me with ff - on a 3 GB notebook - 3e6x100 works out of the box even without
compression: doubles consume 2.2 GB on disk, but the R process remains under
100MB, rest of RAM used by file-system-cache.
If you are under windows, you can create the ffdf files in a compressed folder.
For the random doubles this reduces size on disk to 230MB - which should even
work on a 1GB notebook.
BTW: the most compressed datatype (vmode) that can handle NAs is
"logical": consumes 2bit per tri-bool. The nextmost compressed is
"byte" covering c(NA, -127:127) and consuming its name on disk and in
fs-cache.
The code below should give an idea of how to do pairwise stats on columns where
each pair fits easily into RAM. In the real world, you would not create the data
but import it using read.csv.ffdf (expect that reading your file takes longer
than reading/writing the ffdf).
Regards
Jens Oehlschl?gel
library(ff)
k <- 100
n <- 3e6
# creating a ffdf dataframe of the requires size
l <- vector("list", k)
for (i in 1:k)
l[[i]] <- ff(vmode="double", length=n, update=FALSE)
names(l) <- paste("c", 1:k, sep="")
d <- do.call("ffdf", l)
# writing 100 columns of 1e6 random data takes 90 sec
system.time(
for (i in 1:k){
cat(i, " ")
print(system.time(d[,i] <- rnorm(n))["elapsed"])
}
)["elapsed"]
m <- matrix(as.double(NA), k, k)
# pairwise correlating one column against all others takes ~ 17.5 sec
# pairwise correlating all combinations takes 15 min
system.time(
for (i in 2:k){
cat(i, " ")
print(system.time({
x <- d[[i]][]
for (j in 1:(i-1)){
m[i,j] <- cor(x, d[[j]][])
}
})["elapsed"])
}
)["elapsed"]
--
GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Hi,
I'm responding to the question about storage error, trying to read a 3000000
x 100 dataset into a data.frame.
I wonder whether you can read the data as strings. If the numbers are all one
digit, each cell would require just 1 byte instead of 8. That makes 300MB
instead of 2.4GB. You can run crosstabs on the character values just as easily
as if they were numeric. If you need numeric values, convert them a few at a
time using as.numeric(). Here's an example --
# Generate some data and write it to a text file
v <- rnorm(5,0,0.7); C_xx <- diag(v^2)+v%o%v
C_xx
mu <- rep(5,5)
X.dat <- data.frame(round(mvrnorm(250, mu, C_xx)))
head(X.dat)
write.table(X.dat, "X.dat")
# Read the data using scan, convert it to a data.frame
Xstr.dat <- matrix(scan("X.dat", what="character",
skip=1), 250, byrow=TRUE)
Xstr.dat <- as.data.frame(Xstr.dat[,2:6], stringsAsFactors=FALSE)
head(Xstr.dat)
# Run a crosstab
attach(Xstr.dat)
table(V1, V2)
Probably you do not need the option "stringsAsFactors=FALSE". Without
it, the strings are converted to factors. Probably that does not change the
amount of storage required.
Larry Hotchkiss
------------------------------------------------------------------------------------
Message: 6
Date: Tue, 10 Nov 2009 04:10:07 -0800 (PST)
From: maiya <maja.zaloznik at gmail.com>
Subject: [R] Error: cannot allocate vector of size...
To: r-help at r-project.org
Message-ID: <26282348.post at talk.nabble.com>
Content-Type: text/plain; charset=us-ascii
I'm trying to import a table into R the file is about 700MB. Here's my
first
try:
> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 15.6 Mb
In addition: Warning messages:
1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
Then I tried
> memory.limit(size=4095)
and got
> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 11.3 Mb
but no additional errors. Then optimistically to clear up the workspace:
> rm()
> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 15.6 Mb
Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb?
I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable
memory is usually 2Gb. Surely they mean GB?
The file I'm importing has about 3 million cases with 100 variables that I
want to crosstabulate each with each. Is this completely unrealistic?
Thanks!
Maja
--