thr3ads.net - R help - [R] how to parallelize 'apply' across multiple cores on a Mac [May 2013]

If this information is useful, please help other people find it:
Share via:

David Romano

2013-May-03 23:56 UTC

[R] how to parallelize 'apply' across multiple cores on a Mac

Hi everyone,

I'm trying to use apply (with a call to zoo's rollapply within) on the
columns of a 1.5Kx165K matrix, and I'd like to make use of the other cores
on my machine to speed it up. (And hopefully also leave more memory free: I
find that after I create a big object like this, I have to save my
workspace and then close and reopen R to be able to recover memory tied up
by R, but maybe that's a separate issue -- if so, please let me know!)

It seems the package 'multicore' has a parallel version of
'lapply', which
I suppose I could combine with a 'do.call' (I think) to gather the
elements
of the output list into a matrix, but I was wondering whether there might
be another route.

And, in case the particular way I constructed the call to 'apply' might
be
the source of the problem, here is a deconstructed version of what I did to
each column, for easier parsing:
-----------------------------  begin call to 'apply'
------------------------
Step 1:  Identify several disjoint subsequences of fixed length, say length
three, of a column.

column.values <- 1:16
desired.subseqs <- c( NA, NA, NA, 1, 1, 1, NA, 1, 1, 1, NA, NA, 1,1,1, NA
)   # this vector is used for every column.
desired.values <- desired.subseq * column.values

Step 2:  Find the average value of each subsequence.

desired.means <- rollapply( desired.values, 3, mean, fill=NA, align
"right", na.rm = FALSE)  # put mean in highest index of subsequence
and
retain original vector length
desired.means
[1] NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA NA 14 NA

Step 3:   Shift values forward by one index value, retaining original
vector length.

desired.means <- zoo( desired.means )  # in order to be able to use lag.zoo
desired.means <- lag( desired.means, k = -1, na.pad = TRUE)
desired.means
[1] NA NA NA NA NA NA 5 NA NA NA 9 NA NA NA NA 14

Step 4:   Use last-observation-carried-forward, retaining original vector
length.

desired.means <- na.locf( desired.means, na.rm = FALSE )
desired.means
[1] NA NA NA NA NA NA 5 5 5 5 9 9 9 9 9 14

Step 5:  Use next-observation-carried-backward to assign values to initial
sequence of NAs.

desired.means <- na.locf( desired.means, fromLast = TRUE)
desired.means
[1] 5 5 5 5 5 5 5 5 5 5 9 9 9 9 9 14

Step 6:  Convert back to vector (from zoo object), and subtract from column.

desired.column <- vector.values - coredata(desired.means)
desired.column
[1] -4 -3 -2 -1 0 1 2 3 4 5 2 3 4 5 6 2
-----------------------------  end call to 'apply'
------------------------

Thanks,
David

	[[alternative HTML version deleted]]

Charles Berry

2013-May-04 16:32 UTC

head link

[R] how to parallelize 'apply' across multiple cores on a Mac

David Romano <dromano <at> stanford.edu> writes:
> 
> Hi everyone,
> 
> I'm trying to use apply (with a call to zoo's rollapply within) on
the
> columns of a 1.5Kx165K matrix, and I'd like to make use of the other
cores
> on my machine to speed it up. (And hopefully also leave more memory free: I
> find that after I create a big object like this, I have to save my
> workspace and then close and reopen R to be able to recover memory tied up
> by R, but maybe that's a separate issue -- if so, please let me know!)
> 
> It seems the package 'multicore' has a parallel version of
'lapply', which
> I suppose I could combine with a 'do.call' (I think) to gather the
elements
> of the output list into a matrix, but I was wondering whether there might
> be another route.
> [description of simple calc's deleted]

David,

If you insist on explicitly parallelizing this:

The functions in the recommended package 'parallel' work on a Mac. 

I would not try to work on each tiny column as a separate function call - 
too much overhead if you parallelize - instead, bundle up 100-1000 columns
to operate on.

The calc's you describe are sound simple enough that I would just write
them in C and use the .Call interface to invoke them. You only need enough
working memory in C to operate on one column and space to save the result. 

So a MacBook with 8GB of memory will handle it with room to breathe.

This is a good use case for the 'inline' package, especially if you are
unfamiliar with the use of .Call.

==
But it might be as fast to forget about paralleizing this (explicitly).

If !any(is.na(column.values)), then what you are doing can be achieved by

  desired.means[ , column.subset] <- 
       crossprod( suitable.matrix, matrix.values )

or better still

  desired.means[, column.subset] <- 
      crossprod(minimal.matrix, matrix.values)[fill.rows,]

where suitable.matrix implements your steps 2-6. 

minimal.matrix is unique(suitable.matrix,MARGIN=2)

fill.rows is s.t  minimal.matrix[fill.rows,] == suitable.matrix 

matrix.values is a subset of columns from your original matrix

and column.subset is where the result should be placed in desired means.

On a Mac, the vecLib BLAS will do crossprod using the multiple 
cores without your needing to do anything special. So you can forget about 
'parallel', 'multicore', etc.

So your remaining problem is to reread steps 2=6 and figure out what
'minimal.matrix' and 'fill.rows' have to be.

==
You can also approach this problem using 'filter', but that can get 
'convoluted' (pun intended - see ?filter).

HTH,

David Romano

2013-May-04 18:27 UTC

head link

[R] how to parallelize 'apply' across multiple cores on a Mac

(I neglected to use reply-all.)

---------- Forwarded message ----------
From: David Romano <dromano at stanford.edu>
Date: Sat, May 4, 2013 at 11:25 AM
Subject: Re: [R] how to parallelize 'apply' across multiple cores on a
Mac
To: Charles Berry <ccberry at ucsd.edu>


On Sat, May 4, 2013 at 9:32 AM, Charles Berry <ccberry at ucsd.edu>
wrote:> David,
>
> If you insist on explicitly parallelizing this:
>
> The functions in the recommended package 'parallel' work on a Mac.
>
> I would not try to work on each tiny column as a separate function call -
> too much overhead if you parallelize - instead, bundle up 100-1000 columns
> to operate on.
>
> The calc's you describe are sound simple enough that I would just write
> them in C and use the .Call interface to invoke them. You only need enough
> working memory in C to operate on one column and space to save the result.
>
> So a MacBook with 8GB of memory will handle it with room to breathe.
>
> This is a good use case for the 'inline' package, especially if you
are
> unfamiliar with the use of .Call.
>
>
> ==>
> But it might be as fast to forget about paralleizing this (explicitly).
>
[detailed recommendations deleted]>
> On a Mac, the vecLib BLAS will do crossprod using the multiple
> cores without your needing to do anything special. So you can forget about
> 'parallel', 'multicore', etc.
>
>
> So your remaining problem is to reread steps 2=6 and figure out what
> 'minimal.matrix' and 'fill.rows' have to be.
>
> ==>
> You can also approach this problem using 'filter', but that can get
> 'convoluted' (pun intended - see ?filter).
>
> HTH,
Thanks, Charles, for all the helpful pointers!   For the moment, I'll
leave parallelization aside, and will explore using 'crossprod' and
'filter'.   Although, from your suggestion that 8 GB of memory should
be sufficient if I went the parallel, I also wonder whether I'm
suffering not just from inefficient use of computing resources, but
that there's a memory leak as well:   The original 'apply' code
would,
in much less than a minute, take over the full 18 GB of memory
available on my workstation, and then leave it functioning at a crawl
for at least a half hour or so.   I'll ask about this by reposting
this message again with a different subject, so no need to address it
in this thread.

Thanks again,
David

Apparently Analagous Threads

Search for more seemingly similar threads

R help - May 2013 - how to parallelize 'apply' across multiple cores on a Mac

[R] how to parallelize 'apply' across multiple cores on a Mac

[R] how to parallelize 'apply' across multiple cores on a Mac

[R] how to parallelize 'apply' across multiple cores on a Mac

Apparently Analagous Threads