David Romano
2013-May-04 19:55 UTC
[R] memory leak using 'apply'? [was: how to parallelize 'apply' across multiple cores on a Mac]
Hi everyone,>From the answers I've received to the question below, it occurs to me theremay be more than inefficient programming on my part involved: The 'apply' code described below quickly takes up the 18 GB of memory I have available, which leaves my machine functioning at a crawl for the at least 30 minutes (likely more) it takes for R complete it computations. Similar behavior arises when try to add even a handful of columns to the matrix (data frame, really) I obtain from the 'apply' described below, the only difference being how long it takes to complete the task, which is more on the order of five minutes for adding four columns. I'd be grateful for any suggestions about how to trouble-shoot what's happening, or how to prevent R from taking up so much of the available memory (which is then not released until I restart R)! Thanks in advance for you help, David On Fri, May 3, 2013 at 4:56 PM, David Romano <dromano@stanford.edu> wrote:> Hi everyone, > > I'm trying to use apply (with a call to zoo's rollapply within) on the > columns of a 1.5Kx165K matrix, and I'd like to make use of the other cores > on my machine to speed it up. (And hopefully also leave more memory free: I > find that after I create a big object like this, I have to save my > workspace and then close and reopen R to be able to recover memory tied up > by R, but maybe that's a separate issue -- if so, please let me know!) > > It seems the package 'multicore' has a parallel version of 'lapply', which > I suppose I could combine with a 'do.call' (I think) to gather the elements > of the output list into a matrix, but I was wondering whether there might > be another route. > > And, in case the particular way I constructed the call to 'apply' might be > the source of the problem, here is a deconstructed version of what I did to > each column, for easier parsing: > ----------------------------- begin call to 'apply' > ------------------------ > Step 1: Identify several disjoint subsequences of fixed length, say > length three, of a column. > > column.values <- 1:16 > desired.subseqs <- c( NA, NA, NA, 1, 1, 1, NA, 1, 1, 1, NA, NA, 1,1,1, NA > ) # this vector is used for every column. > desired.values <- desired.subseq * column.values > > Step 2: Find the average value of each subsequence. > > desired.means <- rollapply( desired.values, 3, mean, fill=NA, align > "right", na.rm = FALSE) # put mean in highest index of subsequence and > retain original vector length > desired.means > [1] NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA NA 14 NA > > Step 3: Shift values forward by one index value, retaining original > vector length. > > desired.means <- zoo( desired.means ) # in order to be able to use lag.zoo > desired.means <- lag( desired.means, k = -1, na.pad = TRUE) > desired.means > [1] NA NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA 14 > > Step 4: Use last-observation-carried-forward, retaining original vector > length. > > desired.means <- na.locf( desired.means, na.rm = FALSE ) > desired.means > [1] NA NA NA NA NA NA 5 5 5 5 9 9 9 9 9 14 > > Step 5: Use next-observation-carried-backward to assign values to initial > sequence of NAs. > > desired.means <- na.locf( desired.means, fromLast = TRUE) > desired.means > [1] 5 5 5 5 5 5 5 5 5 5 9 9 9 9 9 14 > > Step 6: Convert back to vector (from zoo object), and subtract from > column. > > desired.column <- vector.values - coredata(desired.means) > desired.column > [1] -4 -3 -2 -1 0 1 2 3 4 5 2 3 4 5 6 2 > ----------------------------- end call to 'apply' ------------------------ > > Thanks, > David > >[[alternative HTML version deleted]]