Marion Dumas
2011-Apr-15 01:41 UTC
[R] simulations with very large number of iterations (1 billion)
Hello R-help list I'm trying to run 1 billion iterations of a code with calls to random distributions to implement a data generating process and subsequent computation of various estimators that are recorded for further comparison of performance. I have two question about how to achieve this: 1. the most important: on my laptop, R gives me an error message saying that it cannot allocate sufficient space for the matrix that is meant to record the results (a 1 billion by 4 matrix). Is this computer-specific? Are there ways to circumvent this limit? Or is it hopeless to run 1 billion iterations in one batch? ( the alternative being to run, for example, 1000 iterations of a 1 million iteration process that spits out output files that can then be combined manually). 2. secondly: when I profile the code on a smaller number of iterations, it says that colSums is the function that has the longest self time. I am using this to compute stratum-specific treatment effects. My thinking was that the fastest way to compute mean outcome conditional on treatment for each stratum would be to combine all strata in one matrix and apply colSums-type functions on it. Maybe I am wrong and there are better ways? Thank you in advance for any help you may provide.
Brian J Mingus
2011-Apr-15 06:28 UTC
[R] simulations with very large number of iterations (1 billion)
On Thu, Apr 14, 2011 at 7:41 PM, Marion Dumas <mariouka@gmail.com> wrote:> Hello R-help list > I'm trying to run 1 billion iterations of a code with calls to random > distributions to implement a data generating process and subsequent > computation of various estimators that are recorded for further comparison > of performance. I have two question about how to achieve this: > 1. the most important: on my laptop, R gives me an error message saying > that it cannot allocate sufficient space for the matrix that is meant to > record the results (a 1 billion by 4 matrix). Is this computer-specific? Are > there ways to circumvent this limit? Or is it hopeless to run 1 billion > iterations in one batch? ( the alternative being to run, for example, 1000 > iterations of a 1 million iteration process that spits out output files that > can then be combined manually). > 2. secondly: when I profile the code on a smaller number of iterations, it > says that colSums is the function that has the longest self time. I am using > this to compute stratum-specific treatment effects. My thinking was that > the fastest way to compute mean outcome conditional on treatment for each > stratum would be to combine all strata in one matrix and apply colSums-type > functions on it. Maybe I am wrong and there are better ways? > > Thank you in advance for any help you may provide. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >The first thing you need to do is estimate the amount of memory that is going to being needed. Then, estimate the amount of time it's going to take. You probably need a 64 bit computer and 4-8 GB of memory at least. You may not want to use R, insteading opting for C code and the GNU Scientific Library. If you can't write C code Lua is pretty easy to learn and GSL has been exposed through it in the GSL Shell: http://www.nongnu.org/gsl-shell/ -- Brian Mingus Graduate student Computational Cognitive Neuroscience Lab University of Colorado at Boulder [[alternative HTML version deleted]]
Reasonably Related Threads
- overlaying several subsets of data frame in pairs plot
- problems with re-loading exported data
- question about adding plots in a grid (lattice package)
- Would RHEL, CentOS, and Fedora Remain Open Source/Free Software After IBM Buys Red Hat for $34 Billion?
- Billion on Debian Etch