Hi,
I'm Sorry for any cross-posting. I've reviewed the archives and could
not find an exact answer to my question below.
I'm trying to generate very large sparse matrices (< 1% non-zero
entries per row). I have a sparse matrix function below which works
well until the row/col count exceeds 10,000. This is being run on a
machine with 32G memory:
sparse_matrix <- function(dims,rnd,p) {
ptm <- proc.time()
x <- round(rnorm(dims*dims),rnd)
x[((abs(x) - p) < 0)] <- 0
y <- matrix(x,nrow=dims,ncol=dims)
proc.time() - ptm
}
When trying to generate the matrix around 20,000 rows/cols on a
machine with 32G of memory, the error message I receive is:
R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
R(335) malloc: *** error: can't allocate region
R(335) malloc: *** set a breakpoint in szone_error to debug
R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
R(335) malloc: *** error: can't allocate region
R(335) malloc: *** set a breakpoint in szone_error to debug
Error: cannot allocate vector of size 3125000 Kb
Error in round(rnorm(dims * dims), rnd) : unable to find the argument
'x' in selecting a method for function 'round'
* Last error line is obvious. Question: on machine w/32G memory, why
can't it allocate a vector of size 3125000 Kb?
When trying to generate the matrix around 30,000 rows/cols, the error
message I receive is:
Error in rnorm(dims * dims) : cannot allocate vector of length 900000000
Error in round(rnorm(dims * dims), rnd) : unable to find the argument
'x' in selecting a method for function 'round'
* Last error line is obvious. Question: is this 900000000 bytes?
kilobytes? This error seems to be specific now to rnorm, but it
doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
rows/cols. Even if this Mb, why can't this be allocated on a machine
with 32G free memory?
When trying to generate the matrix with over 50,000 rows/cols, the
error message I receive is:
Error in rnorm(n, mean, sd) : invalid arguments
In addition: Warning message:
NAs introduced by coercion
Error in round(rnorm(dims * dims), rnd) : unable to find the argument
'x' in selecting a method for function 'round'
* Same.
Why would it generate different errors in each case? Code fixes? Any
simple ways to generate sparse matrices which would avoid above
problems?
Thanks in advance,
Gavin
You need to look at the packages specifically designed for sparse matrices: SparseM and Matrix. url: www.econ.uiuc.edu/~roger Roger Koenker email rkoenker at uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820 On Jun 10, 2006, at 12:53 PM, g l wrote:> Hi, > > I'm Sorry for any cross-posting. I've reviewed the archives and could > not find an exact answer to my question below. > > I'm trying to generate very large sparse matrices (< 1% non-zero > entries per row). I have a sparse matrix function below which works > well until the row/col count exceeds 10,000. This is being run on a > machine with 32G memory: > > sparse_matrix <- function(dims,rnd,p) { > ptm <- proc.time() > x <- round(rnorm(dims*dims),rnd) > x[((abs(x) - p) < 0)] <- 0 > y <- matrix(x,nrow=dims,ncol=dims) > proc.time() - ptm > } > > When trying to generate the matrix around 20,000 rows/cols on a > machine with 32G of memory, the error message I receive is: > > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > Error: cannot allocate vector of size 3125000 Kb > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: on machine w/32G memory, why > can't it allocate a vector of size 3125000 Kb? > > When trying to generate the matrix around 30,000 rows/cols, the error > message I receive is: > > Error in rnorm(dims * dims) : cannot allocate vector of length > 900000000 > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: is this 900000000 bytes? > kilobytes? This error seems to be specific now to rnorm, but it > doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000 > rows/cols. Even if this Mb, why can't this be allocated on a machine > with 32G free memory? > > When trying to generate the matrix with over 50,000 rows/cols, the > error message I receive is: > > Error in rnorm(n, mean, sd) : invalid arguments > In addition: Warning message: > NAs introduced by coercion > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Same. > > Why would it generate different errors in each case? Code fixes? Any > simple ways to generate sparse matrices which would avoid above > problems? > > Thanks in advance, > > Gavin > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting- > guide.html
On Sat, 10 Jun 2006, g l wrote:> Hi, > > I'm Sorry for any cross-posting. I've reviewed the archives and could > not find an exact answer to my question below. > > I'm trying to generate very large sparse matrices (< 1% non-zero > entries per row). I have a sparse matrix function below which works > well until the row/col count exceeds 10,000. This is being run on a > machine with 32G memory: > > sparse_matrix <- function(dims,rnd,p) { > ptm <- proc.time() > x <- round(rnorm(dims*dims),rnd) > x[((abs(x) - p) < 0)] <- 0 > y <- matrix(x,nrow=dims,ncol=dims) > proc.time() - ptm > } > > When trying to generate the matrix around 20,000 rows/cols on a > machine with 32G of memory, the error message I receive is: > > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3) > R(335) malloc: *** error: can't allocate region > R(335) malloc: *** set a breakpoint in szone_error to debug > Error: cannot allocate vector of size 3125000 Kb > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: on machine w/32G memory, why > can't it allocate a vector of size 3125000 Kb? > > When trying to generate the matrix around 30,000 rows/cols, the error > message I receive is: > > Error in rnorm(dims * dims) : cannot allocate vector of length 900000000 > Error in round(rnorm(dims * dims), rnd) : unable to find the argument > 'x' in selecting a method for function 'round' > > * Last error line is obvious. Question: is this 900000000 bytes? > kilobytes? This error seems to be specific now to rnorm, but it > doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000 > rows/cols. Even if this Mb, why can't this be allocated on a machine > with 32G free memory?This is a length of 900000000, as it says. Please read ?"Memory-limits" for the limits in force. (A numeric vector of that length would be over 2^32 bytes and so exceed the address space of a 32-bit executable.) You have not told us your platform or other basic facts:> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmland had you heeded that request we would have had a lot more to go on. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
As an example of how one might do this sort of thing in SparseM
ignoring the rounding aspect...
require(SparseM)
require(msm) #for rtnorm
sm <- function(dim,rnd,q){
n <- rbinom(1, dim * dim, 2 * pnorm(q) - 1)
ia <- sample(dim,n,replace = TRUE)
ja <- sample(dim,n,replace = TRUE)
ra <- rtnorm(n,lower = -q, upper = q)
A <- new("matrix.coo", ia = as.integer(ia), ja =
as.integer
(ja), ra = ra, dimension = as.integer(c(dim,dim)))
A <- as.matrix.csr(A)
}
For dim = 5000 and q = .03 which exceeds Gavin's suggested 1 percent
density, this takes about 30 seconds on my imac and according to Rprof
about 95 percent of that (total) time is spent generating the
truncated normals.
Word of warning: pushing this too much further gets tedious since the
number of random numbers grows like dim^2. For example, dim = 20,000
and q = .02 takes 432 seconds with again 93% of the total time spent in
rnorm and rtnorm...
url: www.econ.uiuc.edu/~roger Roger Koenker
email rkoenker at uiuc.edu Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-6678 Champaign, IL 61820
On Jun 10, 2006, at 12:53 PM, g l wrote:
> Hi,
>
> I'm Sorry for any cross-posting. I've reviewed the archives and
could
> not find an exact answer to my question below.
>
> I'm trying to generate very large sparse matrices (< 1% non-zero
> entries per row). I have a sparse matrix function below which works
> well until the row/col count exceeds 10,000. This is being run on a
> machine with 32G memory:
>
> sparse_matrix <- function(dims,rnd,p) {
> ptm <- proc.time()
> x <- round(rnorm(dims*dims),rnd)
> x[((abs(x) - p) < 0)] <- 0
> y <- matrix(x,nrow=dims,ncol=dims)
> proc.time() - ptm
> }
>
> When trying to generate the matrix around 20,000 rows/cols on a
> machine with 32G of memory, the error message I receive is:
>
> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> Error: cannot allocate vector of size 3125000 Kb
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question: on machine w/32G memory, why
> can't it allocate a vector of size 3125000 Kb?
>
> When trying to generate the matrix around 30,000 rows/cols, the error
> message I receive is:
>
> Error in rnorm(dims * dims) : cannot allocate vector of length
> 900000000
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question: is this 900000000 bytes?
> kilobytes? This error seems to be specific now to rnorm, but it
> doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
> rows/cols. Even if this Mb, why can't this be allocated on a machine
> with 32G free memory?
>
> When trying to generate the matrix with over 50,000 rows/cols, the
> error message I receive is:
>
> Error in rnorm(n, mean, sd) : invalid arguments
> In addition: Warning message:
> NAs introduced by coercion
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Same.
>
> Why would it generate different errors in each case? Code fixes? Any
> simple ways to generate sparse matrices which would avoid above
> problems?
>
> Thanks in advance,
>
> Gavin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-
> guide.html