Hi, I'm Sorry for any cross-posting. I've reviewed the archives and could not find an exact answer to my question below. I'm trying to generate very large sparse matrices (< 1% non-zero entries per row). I have a sparse matrix function below which works well until the row/col count exceeds 10,000. This is being run on a machine with 32G memory: sparse_matrix <- function(dims,rnd,p) { ptm <- proc.time() x <- round(rnorm(dims*dims),rnd) x[((abs(x) - p) < 0)] <- 0 y <- matrix(x,nrow=dims,ncol=dims) proc.time() - ptm } When trying to generate the matrix around 20,000 rows/cols on a machine with 32G of memory, the error message I receive is: R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) R(335) malloc: *** error: can't allocate region R(335) malloc: *** set a breakpoint in szone_error to debug R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) R(335) malloc: *** error: can't allocate region R(335) malloc: *** set a breakpoint in szone_error to debug Error: cannot allocate vector of size 3125000 Kb Error in round(rnorm(dims * dims), rnd) : unable to find the argument 'x' in selecting a method for function 'round' * Last error line is obvious. Question: on machine w/32G memory, why can't it allocate a vector of size 3125000 Kb? When trying to generate the matrix around 30,000 rows/cols, the error message I receive is: Error in rnorm(dims * dims) : cannot allocate vector of length 900000000 Error in round(rnorm(dims * dims), rnd) : unable to find the argument 'x' in selecting a method for function 'round' * Last error line is obvious. Question: is this 900000000 bytes? kilobytes? This error seems to be specific now to rnorm, but it doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000 rows/cols. Even if this Mb, why can't this be allocated on a machine with 32G free memory? When trying to generate the matrix with over 50,000 rows/cols, the error message I receive is: Error in rnorm(n, mean, sd) : invalid arguments In addition: Warning message: NAs introduced by coercion Error in round(rnorm(dims * dims), rnd) : unable to find the argument 'x' in selecting a method for function 'round' * Same. Why would it generate different errors in each case? Code fixes? Any simple ways to generate sparse matrices which would avoid above problems? Thanks in advance, Gavin
You need to look at the packages specifically designed for sparse matrices: SparseM and Matrix. url: www.econ.uiuc.edu/~roger Roger Koenker email rkoenker at uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820 On Jun 10, 2006, at 12:53 PM, g l wrote:> Hi, > > I'm Sorry for any cross-posting. I've reviewed the archives and could > not find an exact answer to my question below. > > I'm trying to generate very large sparse matrices (< 1% non-zero > entries per row). I have a sparse matrix function below which works > well until the row/col count exceeds 10,000. This is being run on a > machine with 32G memory: > > sparse_matrix <- function(dims,rnd,p) { > ptm <- proc.time() > x <- round(rnorm(dims*dims),rnd) > x[((abs(x) - p) < 0)] <- 0 > y <- matrix(x,nrow=dims,ncol=dims) > proc.time() - ptm > } > > When trying to generate the matrix around 20,000 rows/cols on a > machine with 32G of memory, the error message I receive is: > > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > Error: cannot allocate vector of size 3125000 Kb > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: on machine w/32G memory, why > can't it allocate a vector of size 3125000 Kb? > > When trying to generate the matrix around 30,000 rows/cols, the error > message I receive is: > > Error in rnorm(dims * dims) : cannot allocate vector of length > 900000000 > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: is this 900000000 bytes? > kilobytes? This error seems to be specific now to rnorm, but it > doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000 > rows/cols. Even if this Mb, why can't this be allocated on a machine > with 32G free memory? > > When trying to generate the matrix with over 50,000 rows/cols, the > error message I receive is: > > Error in rnorm(n, mean, sd) : invalid arguments > In addition: Warning message: > NAs introduced by coercion > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Same. > > Why would it generate different errors in each case? Code fixes? Any > simple ways to generate sparse matrices which would avoid above > problems? > > Thanks in advance, > > Gavin > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting- > guide.html
On Sat, 10 Jun 2006, g l wrote:> Hi, > > I'm Sorry for any cross-posting. I've reviewed the archives and could > not find an exact answer to my question below. > > I'm trying to generate very large sparse matrices (< 1% non-zero > entries per row). I have a sparse matrix function below which works > well until the row/col count exceeds 10,000. This is being run on a > machine with 32G memory: > > sparse_matrix <- function(dims,rnd,p) { > ptm <- proc.time() > x <- round(rnorm(dims*dims),rnd) > x[((abs(x) - p) < 0)] <- 0 > y <- matrix(x,nrow=dims,ncol=dims) > proc.time() - ptm > } > > When trying to generate the matrix around 20,000 rows/cols on a > machine with 32G of memory, the error message I receive is: > > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > Error: cannot allocate vector of size 3125000 Kb > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: on machine w/32G memory, why > can't it allocate a vector of size 3125000 Kb? > > When trying to generate the matrix around 30,000 rows/cols, the error > message I receive is: > > Error in rnorm(dims * dims) : cannot allocate vector of length 900000000 > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: is this 900000000 bytes? > kilobytes? This error seems to be specific now to rnorm, but it > doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000 > rows/cols. Even if this Mb, why can't this be allocated on a machine > with 32G free memory?This is a length of 900000000, as it says. Please read ?"Memory-limits" for the limits in force. (A numeric vector of that length would be over 2^32 bytes and so exceed the address space of a 32-bit executable.) You have not told us your platform or other basic facts:> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmland had you heeded that request we would have had a lot more to go on. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
As an example of how one might do this sort of thing in SparseM ignoring the rounding aspect... require(SparseM) require(msm) #for rtnorm sm <- function(dim,rnd,q){ n <- rbinom(1, dim * dim, 2 * pnorm(q) - 1) ia <- sample(dim,n,replace = TRUE) ja <- sample(dim,n,replace = TRUE) ra <- rtnorm(n,lower = -q, upper = q) A <- new("matrix.coo", ia = as.integer(ia), ja = as.integer (ja), ra = ra, dimension = as.integer(c(dim,dim))) A <- as.matrix.csr(A) } For dim = 5000 and q = .03 which exceeds Gavin's suggested 1 percent density, this takes about 30 seconds on my imac and according to Rprof about 95 percent of that (total) time is spent generating the truncated normals. Word of warning: pushing this too much further gets tedious since the number of random numbers grows like dim^2. For example, dim = 20,000 and q = .02 takes 432 seconds with again 93% of the total time spent in rnorm and rtnorm... url: www.econ.uiuc.edu/~roger Roger Koenker email rkoenker at uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820 On Jun 10, 2006, at 12:53 PM, g l wrote:> Hi, > > I'm Sorry for any cross-posting. I've reviewed the archives and could > not find an exact answer to my question below. > > I'm trying to generate very large sparse matrices (< 1% non-zero > entries per row). I have a sparse matrix function below which works > well until the row/col count exceeds 10,000. This is being run on a > machine with 32G memory: > > sparse_matrix <- function(dims,rnd,p) { > ptm <- proc.time() > x <- round(rnorm(dims*dims),rnd) > x[((abs(x) - p) < 0)] <- 0 > y <- matrix(x,nrow=dims,ncol=dims) > proc.time() - ptm > } > > When trying to generate the matrix around 20,000 rows/cols on a > machine with 32G of memory, the error message I receive is: > > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > Error: cannot allocate vector of size 3125000 Kb > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: on machine w/32G memory, why > can't it allocate a vector of size 3125000 Kb? > > When trying to generate the matrix around 30,000 rows/cols, the error > message I receive is: > > Error in rnorm(dims * dims) : cannot allocate vector of length > 900000000 > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: is this 900000000 bytes? > kilobytes? This error seems to be specific now to rnorm, but it > doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000 > rows/cols. Even if this Mb, why can't this be allocated on a machine > with 32G free memory? > > When trying to generate the matrix with over 50,000 rows/cols, the > error message I receive is: > > Error in rnorm(n, mean, sd) : invalid arguments > In addition: Warning message: > NAs introduced by coercion > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Same. > > Why would it generate different errors in each case? Code fixes? Any > simple ways to generate sparse matrices which would avoid above > problems? > > Thanks in advance, > > Gavin > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting- > guide.html