I recently learned about the bigmemory and foreach packages and am trying
to use them to help me create a very large matrix. Without those
packages, I can create the type of matrix that I want with 10 columns and
5e6 rows. I would like to be able to scale up to 5e9 rows, or more, if
possible.
I have created a simplified example of what I'm trying to do, below. The
first part of the code shows what I'm trying to do without using the
bigmemory or foreach packages. I take information from a data frame and
use that information to fill a matrix with simulated data.
The last part of the code is my ugly attempt to use the bigmemory and
foreach packages in preparation for scaling up to a very large matrix. It
seems to be working ... at this small scale, anyway. But, surely there is
a better way to do it than what I present here. I am particularly
concerned about efficiency because when I did a little experimenting with
foreach and rnorm using 5e4 records, things seemed to get slow in a hurry
(ha!). I would appreciate any suggestions you could offer.
I'm using R for Windows 2.13.0, and my memory.limit() in R is 2GB
(32-bit).
Thanks!
Jean
====> system.time(look <- rnorm(5e4))
user system elapsed
0.02 0.00 0.01 > system.time(look <- foreach(i=1:5e4, .combine=c) %do% rnorm(1))
user system elapsed
91.29 0.05 92.40 > system.time(look <- foreach(i=1:5e4, .combine=c) %dopar% rnorm(1))
user system elapsed
90.06 0.03 91.20
====
library(bigmemory)
library(foreach)
# small data frame that instructs how to fill matrix
info <- data.frame(p=c(0.3, 0.5, 0.2), a1=c(100, 200, 80), a2=c(120, 300,
150))
nrowz <- dim(info)[1]
# example with small matrix
n <- 50
end.i <- cumsum(n*info$p)
start.i <- c(0, end.i[-nrowz]) + 1
m <- matrix(NA, nrow=n, ncol=2)
for(i in 1:nrowz) {
m[start.i[i]:end.i[i], 1] <- runif(n*info$p[i], info$a1[i],
info$a2[i])
m[start.i[i]:end.i[i], 2] <- rnorm(n*info$p[i], info$a1[i],
info$a2[i])
}
# example getting ready to scale up to large matrix
n <- 50
end.i <- cumsum(n*info$p)
start.i <- c(0, end.i[-nrowz]) + 1
m <- filebacked.big.matrix(nrow=n, ncol=2, backingfile="test3.bin",
descriptorfile="test3.desc")
m[start.i[1]:end.i[1], 1] <- foreach(i=start.i[1]:end.i[1], .combine=c)
%do% runif(1, info$a1[1], info$a2[1])
m[start.i[2]:end.i[2], 1] <- foreach(i=start.i[2]:end.i[2], .combine=c)
%do% runif(1, info$a1[2], info$a2[2])
m[start.i[3]:end.i[3], 1] <- foreach(i=start.i[3]:end.i[3], .combine=c)
%do% runif(1, info$a1[3], info$a2[3])
m[start.i[1]:end.i[1], 2] <- foreach(i=start.i[1]:end.i[1], .combine=c)
%do% rnorm(1, info$a1[1], info$a2[1])
m[start.i[2]:end.i[2], 2] <- foreach(i=start.i[2]:end.i[2], .combine=c)
%do% rnorm(1, info$a1[2], info$a2[2])
m[start.i[3]:end.i[3], 2] <- foreach(i=start.i[3]:end.i[3], .combine=c)
%do% rnorm(1, info$a1[3], info$a2[3])
head(m)
`·.,, ><(((º> `·.,, ><(((º> `·.,, ><(((º>
Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409 USA
[[alternative HTML version deleted]]