David Romano
2013-May-03 23:56 UTC
[R] how to parallelize 'apply' across multiple cores on a Mac
Hi everyone, I'm trying to use apply (with a call to zoo's rollapply within) on the columns of a 1.5Kx165K matrix, and I'd like to make use of the other cores on my machine to speed it up. (And hopefully also leave more memory free: I find that after I create a big object like this, I have to save my workspace and then close and reopen R to be able to recover memory tied up by R, but maybe that's a separate issue -- if so, please let me know!) It seems the package 'multicore' has a parallel version of 'lapply', which I suppose I could combine with a 'do.call' (I think) to gather the elements of the output list into a matrix, but I was wondering whether there might be another route. And, in case the particular way I constructed the call to 'apply' might be the source of the problem, here is a deconstructed version of what I did to each column, for easier parsing: ----------------------------- begin call to 'apply' ------------------------ Step 1: Identify several disjoint subsequences of fixed length, say length three, of a column. column.values <- 1:16 desired.subseqs <- c( NA, NA, NA, 1, 1, 1, NA, 1, 1, 1, NA, NA, 1,1,1, NA ) # this vector is used for every column. desired.values <- desired.subseq * column.values Step 2: Find the average value of each subsequence. desired.means <- rollapply( desired.values, 3, mean, fill=NA, align "right", na.rm = FALSE) # put mean in highest index of subsequence and retain original vector length desired.means [1] NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA NA 14 NA Step 3: Shift values forward by one index value, retaining original vector length. desired.means <- zoo( desired.means ) # in order to be able to use lag.zoo desired.means <- lag( desired.means, k = -1, na.pad = TRUE) desired.means [1] NA NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA 14 Step 4: Use last-observation-carried-forward, retaining original vector length. desired.means <- na.locf( desired.means, na.rm = FALSE ) desired.means [1] NA NA NA NA NA NA 5 5 5 5 9 9 9 9 9 14 Step 5: Use next-observation-carried-backward to assign values to initial sequence of NAs. desired.means <- na.locf( desired.means, fromLast = TRUE) desired.means [1] 5 5 5 5 5 5 5 5 5 5 9 9 9 9 9 14 Step 6: Convert back to vector (from zoo object), and subtract from column. desired.column <- vector.values - coredata(desired.means) desired.column [1] -4 -3 -2 -1 0 1 2 3 4 5 2 3 4 5 6 2 ----------------------------- end call to 'apply' ------------------------ Thanks, David [[alternative HTML version deleted]]
Charles Berry
2013-May-04 16:32 UTC
[R] how to parallelize 'apply' across multiple cores on a Mac
David Romano <dromano <at> stanford.edu> writes:> > Hi everyone, > > I'm trying to use apply (with a call to zoo's rollapply within) on the > columns of a 1.5Kx165K matrix, and I'd like to make use of the other cores > on my machine to speed it up. (And hopefully also leave more memory free: I > find that after I create a big object like this, I have to save my > workspace and then close and reopen R to be able to recover memory tied up > by R, but maybe that's a separate issue -- if so, please let me know!) > > It seems the package 'multicore' has a parallel version of 'lapply', which > I suppose I could combine with a 'do.call' (I think) to gather the elements > of the output list into a matrix, but I was wondering whether there might > be another route. >[description of simple calc's deleted] David, If you insist on explicitly parallelizing this: The functions in the recommended package 'parallel' work on a Mac. I would not try to work on each tiny column as a separate function call - too much overhead if you parallelize - instead, bundle up 100-1000 columns to operate on. The calc's you describe are sound simple enough that I would just write them in C and use the .Call interface to invoke them. You only need enough working memory in C to operate on one column and space to save the result. So a MacBook with 8GB of memory will handle it with room to breathe. This is a good use case for the 'inline' package, especially if you are unfamiliar with the use of .Call. == But it might be as fast to forget about paralleizing this (explicitly). If !any(is.na(column.values)), then what you are doing can be achieved by desired.means[ , column.subset] <- crossprod( suitable.matrix, matrix.values ) or better still desired.means[, column.subset] <- crossprod(minimal.matrix, matrix.values)[fill.rows,] where suitable.matrix implements your steps 2-6. minimal.matrix is unique(suitable.matrix,MARGIN=2) fill.rows is s.t minimal.matrix[fill.rows,] == suitable.matrix matrix.values is a subset of columns from your original matrix and column.subset is where the result should be placed in desired means. On a Mac, the vecLib BLAS will do crossprod using the multiple cores without your needing to do anything special. So you can forget about 'parallel', 'multicore', etc. So your remaining problem is to reread steps 2=6 and figure out what 'minimal.matrix' and 'fill.rows' have to be. == You can also approach this problem using 'filter', but that can get 'convoluted' (pun intended - see ?filter). HTH,
David Romano
2013-May-04 18:27 UTC
[R] how to parallelize 'apply' across multiple cores on a Mac
(I neglected to use reply-all.) ---------- Forwarded message ---------- From: David Romano <dromano at stanford.edu> Date: Sat, May 4, 2013 at 11:25 AM Subject: Re: [R] how to parallelize 'apply' across multiple cores on a Mac To: Charles Berry <ccberry at ucsd.edu> On Sat, May 4, 2013 at 9:32 AM, Charles Berry <ccberry at ucsd.edu> wrote:> David, > > If you insist on explicitly parallelizing this: > > The functions in the recommended package 'parallel' work on a Mac. > > I would not try to work on each tiny column as a separate function call - > too much overhead if you parallelize - instead, bundle up 100-1000 columns > to operate on. > > The calc's you describe are sound simple enough that I would just write > them in C and use the .Call interface to invoke them. You only need enough > working memory in C to operate on one column and space to save the result. > > So a MacBook with 8GB of memory will handle it with room to breathe. > > This is a good use case for the 'inline' package, especially if you are > unfamiliar with the use of .Call. > > > ==> > But it might be as fast to forget about paralleizing this (explicitly). >[detailed recommendations deleted]> > On a Mac, the vecLib BLAS will do crossprod using the multiple > cores without your needing to do anything special. So you can forget about > 'parallel', 'multicore', etc. > > > So your remaining problem is to reread steps 2=6 and figure out what > 'minimal.matrix' and 'fill.rows' have to be. > > ==> > You can also approach this problem using 'filter', but that can get > 'convoluted' (pun intended - see ?filter). > > HTH,Thanks, Charles, for all the helpful pointers! For the moment, I'll leave parallelization aside, and will explore using 'crossprod' and 'filter'. Although, from your suggestion that 8 GB of memory should be sufficient if I went the parallel, I also wonder whether I'm suffering not just from inefficient use of computing resources, but that there's a memory leak as well: The original 'apply' code would, in much less than a minute, take over the full 18 GB of memory available on my workstation, and then leave it functioning at a crawl for at least a half hour or so. I'll ask about this by reposting this message again with a different subject, so no need to address it in this thread. Thanks again, David
Apparently Analagous Threads
- different behavior of $ with string literal vs string variable as argument
- odd behavior of browser()
- using 'apply' to apply princomp to an array of datasets
- how to "multiply" list of matrices by list of vectors
- using ifelse to remove NA's from specific columns of a data frame containing strings and numbers