Dear all, I need to generate numbers from multivariate normal with large dimensions (5,000,000). Below is my code and the error I got from R. Sigma in the code is the covariance matrix. Can anyone give some idea on how to take care of this error. Thank you. Hannah> m <- 5000000 > m1 <- 0.5*m > rho <- 0.5 > Sigma <- rho* matrix(1, m, m)+diag(1-rho, m)Error in matrix(1, m, m) : too many elements specified> mu <- c(rep(2, m1), rep(0, m-m1)) > x.mod1 <- mvrnorm(n = 1, mu, Sigma, tol = 1e-6, empirical = FALSE)Error in mvrnorm(n = 1, mu, Sigma, tol = 1e-06, empirical = FALSE) : incompatible arguments [[alternative HTML version deleted]]
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:> Dear all, > I need to generate numbers from multivariate normal with large dimensions > (5,000,000). > Below is my code and the error I got from R. Sigma in the code is the > covariance > matrix. Can anyone give some idea on how to take care of this error. Thank > you. > Hannah > > > m <- 5000000 > > m1 <- 0.5*m > > rho <- 0.5 > > Sigma <- rho* matrix(1, m, m)+diag(1-rho, m) > Error in matrix(1, m, m) : too many elements specifiedHi. The matrix of dimension m times m does not fit into memory, since it requires 8*m^2 = 2e+14 bytes = 2e+05 GB. Generating a multivariate normal with a covariance matrix with 1 on the diagonal and rho outside of the diagonal may be done also as follows. m <- 10 # can be 5000000 rho <- 0.5 # single vector x <- rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) # several vectors a <- t(replicate(10000, rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)))) # check the sample covariance matrix if m is not too large sigma <- cov(a) range(diag(sigma)) # elements on the diagonal [1] 0.9963445 1.0196015 diag(sigma) <- NA range(sigma, na.rm=TRUE) # elements outside of the diagonal [1] 0.4935129 0.5162836 Generating several vectors using replicate() may not be efficient. The following can be used instead. n <- 10000 a <- matrix(rnorm(n*m, sd=sqrt(1 - rho)), nrow=n, ncol=m) + rnorm(n, sd=sqrt(rho)) Note that the size of "a" is n times m and it should fit into the memory. Hope this helps. Petr Savicky.
On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote:> Dear all, > I need to generate numbers from multivariate normal with large dimensions > (5,000,000).Hi. I am replying to your first email, since the other did not arrive to my folder, possibly filtered out by a spam filter. I see them at the web interface. 1. Error: cannot allocate vector of size 381.5 Mb The error message makes sense. The matrix requires m*n*8/2^20 MB, which is in your case m <- 100000 n <- 500 m*n*8/2^20 [1] 381.4697 May be, you already have other large objects in the memory. Try to minimize the number and size of objects, which you need simultaneously in an R session. 2. Generating a multivariate normal distribution. As Peter Dalgaard pointed out, a speed up is possible only for special types of the covariance matrix Sigma. A general way is to find a matrix A such that A A^t = Sigma. Then, the vector A X, where X is from N(0,I) and I is an identity matrix of an appropriate dimension, has covariance Sigma. This is also the way, how mvtnorm package works. A speed up is possible, if computing the product A X does not require to have matrix A explicitly represented in memory. The matrix A need not be a square matrix. In particular, the previous case may be understood as using the matrix A, which for a small m is as follows. m <- 5 rho <- 0.5 A <- cbind(sqrt(rho), sqrt(1 - rho)*diag(m)) A [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0.7071068 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 [2,] 0.7071068 0.0000000 0.7071068 0.0000000 0.0000000 0.0000000 [3,] 0.7071068 0.0000000 0.0000000 0.7071068 0.0000000 0.0000000 [4,] 0.7071068 0.0000000 0.0000000 0.0000000 0.7071068 0.0000000 [5,] 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 0.7071068 A %*% t(A) [,1] [,2] [,3] [,4] [,5] [1,] 1.0 0.5 0.5 0.5 0.5 [2,] 0.5 1.0 0.5 0.5 0.5 [3,] 0.5 0.5 1.0 0.5 0.5 [4,] 0.5 0.5 0.5 1.0 0.5 [5,] 0.5 0.5 0.5 0.5 1.0 This construction is conceptually possible also for a large m because the structure of A allows to compute A X by simpler operations with the vector X than an explicit matrix product. Namely, the expression rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) or, more clearly, sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m) is equivalent to the required A X, where X consists of rnorm(1) and rnorm(m) together. If you have a specific Sigma, describe it and we can discuss, whether an appropriate A can be found. Hope this helps. Petr Savicky.
Folks: Perhaps I am naive, ignorant, or foolish, but this whole discussion seems rather ridiculous. What possible relation to reality could a multivariate normal of the size requested have? Even for simulation, it seems like nonsense. Cheers, Bert On Sun, Feb 19, 2012 at 11:35 AM, Petr Savicky <savicky at cs.cas.cz> wrote:> On Sat, Feb 18, 2012 at 06:00:53PM -0500, li li wrote: >> Dear all, >> ? I need to generate numbers from multivariate normal with large dimensions >> (5,000,000). > > Hi. > > I am replying to your first email, since the other did not arrive > to my folder, possibly filtered out by a spam filter. I see them > at the web interface. > > 1. Error: cannot allocate vector of size 381.5 Mb > > The error message makes sense. The matrix requires m*n*8/2^20 MB, > which is in your case > > ?m <- 100000 > ?n <- 500 > ?m*n*8/2^20 > > ?[1] 381.4697 > > May be, you already have other large objects in the memory. Try to > minimize the number and size of objects, which you need simultaneously > in an R session. > > 2. Generating a multivariate normal distribution. > > As Peter Dalgaard pointed out, a speed up is possible only > for special types of the covariance matrix Sigma. A general > way is to find a matrix A such that A A^t = Sigma. Then, > the vector A X, where X is from N(0,I) and I is an identity > matrix of an appropriate dimension, has covariance Sigma. > This is also the way, how mvtnorm package works. > > A speed up is possible, if computing the product A X does not > require to have matrix A explicitly represented in memory. > > The matrix A need not be a square matrix. In particular, the > previous case may be understood as using the matrix A, which > for a small m is as follows. > > ?m <- 5 > ?rho <- 0.5 > ?A <- cbind(sqrt(rho), sqrt(1 - rho)*diag(m)) > ?A > > > ? ? ? ? ? ?[,1] ? ? ?[,2] ? ? ?[,3] ? ? ?[,4] ? ? ?[,5] ? ? ?[,6] > ?[1,] 0.7071068 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 > ?[2,] 0.7071068 0.0000000 0.7071068 0.0000000 0.0000000 0.0000000 > ?[3,] 0.7071068 0.0000000 0.0000000 0.7071068 0.0000000 0.0000000 > ?[4,] 0.7071068 0.0000000 0.0000000 0.0000000 0.7071068 0.0000000 > ?[5,] 0.7071068 0.0000000 0.0000000 0.0000000 0.0000000 0.7071068 > > ?A %*% t(A) > > ? ? ? [,1] [,2] [,3] [,4] [,5] > ?[1,] ?1.0 ?0.5 ?0.5 ?0.5 ?0.5 > ?[2,] ?0.5 ?1.0 ?0.5 ?0.5 ?0.5 > ?[3,] ?0.5 ?0.5 ?1.0 ?0.5 ?0.5 > ?[4,] ?0.5 ?0.5 ?0.5 ?1.0 ?0.5 > ?[5,] ?0.5 ?0.5 ?0.5 ?0.5 ?1.0 > > This construction is conceptually possible also for a large m because > the structure of A allows to compute A X by simpler operations with > the vector X than an explicit matrix product. Namely, the expression > > ?rnorm(1, sd=sqrt(rho)) + rnorm(m, sd=sqrt(1 - rho)) > > or, more clearly, > > ?sqrt(rho) * rnorm(1) + sqrt(1 - rho) * rnorm(m) > > is equivalent to the required A X, where X consists of rnorm(1) > and rnorm(m) together. > > If you have a specific Sigma, describe it and we can discuss, > whether an appropriate A can be found. > > Hope this helps. > > Petr Savicky. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm