Dear R helpers Reproducible example: #warning - this causes a hard freeze on the machines I've tried it on matrix.holder<- matrix(rnorm(150), nrow=30, ncol=5) Outexpand.grid(matrix.holder[,1],matrix.holder[,2],matrix.holder[,3],matrix.holder[,4], matrix.holder[,5]) Problem: I'm running an analysis that I would like to do using a matrix containing all the possible combinations of the elements in a [30,5] matrix. Briefly, each possible combination is used to index and subset another matrix. I then run some models on the data in the subsetted matrix and then sometimes export the model results based on a couple criteria. 24,300,000 combinations seems to be too big for R on my computer (Intel i5, about 2.5 GB RAM free, 4 GB total, Rx64 2.15 ) to handle. Requests: 1. Can you tell me how I can estimate the amount of memory a matrix will require before I create it? 2. Do you have recommendations for packages that allow the user to send an object directly to the hard drive? I guess it would have to be partially created in RAM and then dumped to the HD, but the point is that there isn't room for whole thing to be created and then written in pieces to the HD (which even I think I could do). And then of course if it was written as one big piece to the HD, I would need to be able to read it in piece by piece. 3. I also see packages out there to connect R to C. Anyone have ideas for one designed or containing functions designed for this type of problem? Background: When I tried to throw expand.grid() at a matrix of size [30,5] (24,300,000 combinations), my computer choked (I assume due to RAM memory limits, but it might be that doing that just takes a long time and I wasn't ready to stare at a frozen computer for very long). I'm currently working around the problem with five nested loops, with all the drawbacks of and limits imposed by that approach (the biggest for me is that I'd like to attempt to multithread with some of the packages that exist for that). I don't have any formal training in computer science, and the only programming language I use enough to do something of this complexity is R, so programming the whole thing in C (which all the remote sensing folks across the hall said would make creation of this matrix trivial) isn't an easy alternative for me. Thanks! Ben Caldwell Graduate Fellow University of California, Berkeley [[alternative HTML version deleted]]
On Apr 20, 2013, at 2:19 PM, Benjamin Caldwell wrote:> Dear R helpers > > Reproducible example: > > #warning - this causes a hard freeze on the machines I've tried it on > matrix.holder<- matrix(rnorm(150), nrow=30, ncol=5) > > Out> expand.grid(matrix.holder[,1],matrix.holder[,2],matrix.holder[,3],matrix.holder[,4], > matrix.holder[,5]) >On my machine: object.size(Out) 972014344 bytes So with proper setup you might be able to work with this on a 4GB machine, but not likely to be able to do so on the machine you are describing below.> Problem: > > I'm running an analysis that I would like to do using a matrix containing > all the possible combinations of the elements in a [30,5] matrix. Briefly, > each possible combination is used to index and subset another matrix. I > then run some models on the data in the subsetted matrix and then sometimes > export the model results based on a couple criteria. 24,300,000 > combinations seems to be too big for R on my computer (Intel i5, about 2.5 > GB RAM free, 4 GB total, Rx64 2.15 ) to handle. > > Requests: > > 1. Can you tell me how I can estimate the amount of memory a matrix will > require before I create it?Roughly: 5* 8* prod(dim(mat)) # 8 bytes per double> 5*8*(30^5)/972014344[1] 0.9999852 # so my estimate was accurate on a ratio basis to 5 decimal places.> > 2. Do you have recommendations for packages that allow the user to send an > object directly to the hard drive? I guess it would have to be partially > created in RAM and then dumped to the HD, but the point is that there isn't > room for whole thing to be created and then written in pieces to the HD > (which even I think I could do). And then of course if it was written as > one big piece to the HD, I would need to be able to read it in piece by > piece.> > 3. I also see packages out there to connect R to C. Anyone have ideas for > one designed or containing functions designed for this type of problem?Are you saying you have facility with C programming? (And you really have not described the problem. Perhaps a redesign of the solution could accommodate your limited computing resources.> > Background: > > When I tried to throw expand.grid() at a matrix of size [30,5] (24,300,000 > combinations), my computer choked (I assume due to RAM memory limits, but > it might be that doing that just takes a long time and I wasn't ready to > stare at a frozen computer for very long).It took 7 seconds on my 6 year-old MacPro.> I'm currently working around the > problem with five nested loops, with all the drawbacks of and limits > imposed by that approach (the biggest for me is that I'd like to attempt to > multithread with some of the packages that exist for that). > > I don't have any formal training in computer science, and the only > programming language I use enough to do something of this complexity is R, > so programming the whole thing in C (which all the remote sensing folks > across the hall said would make creation of this matrix trivial) isn't an > easy alternative for me.There are many threads on Rhelp and advice in various manuals about how to avoid memory limitations.> > Thanks! > > Ben Caldwell > > Graduate Fellow > University of California, Berkeley-- David Winsemius Alameda, CA, USA
Benjamin Caldwell <btcaldwell <at> berkeley.edu> writes:> > Dear R helpers > > Reproducible example: > > #warning - this causes a hard freeze on the machines I've tried it on > matrix.holder<- matrix(rnorm(150), nrow=30, ncol=5) > > Out> expand.grid(matrix.holder[,1],matrix.holder[,2],matrix.holder[,3],matrix.holder[,4],> matrix.holder[,5]) > > Problem: > > I'm running an analysis that I would like to do using a matrix containing > all the possible combinations of the elements in a [30,5] matrix. Briefly, > each possible combination is used to index and subset another matrix. I > then run some models on the data in the subsetted matrix and then > sometimes > export the model results based on a couple criteria. 24,300,000 > combinations seems to be too big for R on my computer (Intel i5, about 2.5 > GB RAM free, 4 GB total, Rx64 2.15 ) to handle. > > Requests: >[snip]> I'd like to attempt to multithread [snip]Ben, The problem you have is "embarassingly parallel" - as they say. You can effectively use brute force solutions to parallelize the job and do it with subjobs that have smaller memory requirements. One way to parallelize the problem is to create the object 'matrix.holder', then loop thru the values of matrix.holder[,1] and create a subjob that will run all the computations for matrix.holder[i,1] and all the combinations of matrix.holder[,-1]. Run the subjob in a new process and save the results. Later on you combine the saved results. Also, you could try to run each subjob using parallel::mclapply() or some other parallelizing package. Or you could loop over each of the first two columns of matrix.holder creating 900 subjobs. Also, this gives you still smaller memory requirements for the individual jobs. HTH,
Reasonably Related Threads
- [PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support
- [PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support
- [PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support
- [PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support
- Output Data Formatting Question