thr3ads.net - R devel - [Rd] wish list: generalized apply [Dec 2016]

If this information is useful, please help other people find it:
Share via:

John P. Nolan

2016-Dec-08 20:09 UTC

[Rd] wish list: generalized apply

Dear All,

I regularly want to "apply" some function to an array in a way that
the arguments to the user function depend on the index on which the apply is
working.  A simple example is:

A <- array( runif(160), dim=c(5,4,8) )
x <- matrix( runif(32), nrow=4, ncol=8 ) 
b <- runif(8)
f1 <- function( A, x, b ) { sum( A %*% x ) + b } 
result <- rep(0.0,8) 
for (i in 1:8) {
  result[i] <- f1( A[,,i], x[,i] , b[i] )
}

This works, but is slow.  I'd like to be able to do something like:
    generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2),
list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and
b[i] to FUN in addition to A[,,i].

Does such a generalized.apply already exist somewhere?  While I can write a C
function to do a particular case, it would be nice if there was a fast, general
way to do this.

John

............................................................................................

John P. Nolan
Math/Stat Dept., American University
Gray Hall, 4400 Massachusetts Ave, NW
Washington, DC 20016-8050
Phone: 202-885-3140
E-mail:  jpnolan at american.edu
Web:   http://fs2.american.edu/jpnolan/www/

David Winsemius

2016-Dec-08 21:59 UTC

head link

[Rd] wish list: generalized apply

> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpnolan at american.edu>
wrote:
> 
> Dear All,
> 
> I regularly want to "apply" some function to an array in a way
that the arguments to the user function depend on the index on which the apply
is working.  A simple example is:
> 
> A <- array( runif(160), dim=c(5,4,8) )
> x <- matrix( runif(32), nrow=4, ncol=8 ) 
> b <- runif(8)
> f1 <- function( A, x, b ) { sum( A %*% x ) + b } 
> result <- rep(0.0,8) 
> for (i in 1:8) {
>  result[i] <- f1( A[,,i], x[,i] , b[i] )
> }
> 
> This works, but is slow.  I'd like to be able to do something like:
>    generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2),
list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and
b[i] to FUN in addition to A[,,i].
> 
> Does such a generalized.apply already exist somewhere?  While I can write a
C function to do a particular case, it would be nice if there was a fast,
general way to do this.
I would have thought that this would achieve the same result:

result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} )

Or: 

result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] ,
b[i] )} )

(I doubt it will be any faster, but if 'i' is large, parallelism might
help. The inner function appears to be fairly efficient.)
-- 

David Winsemius
Alameda, CA, USA

John P. Nolan

2016-Dec-09 01:00 UTC

head link

[Rd] wish list: generalized apply

-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Thursday, December 8, 2016 4:59 PM
To: John P. Nolan <jpnolan at american.edu>
Cc: Charles C. Berry <R-devel at r-project.org>
Subject: Re: [Rd] wish list: generalized apply

> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpnolan at american.edu>
wrote:
> 
> Dear All,
> 
> I regularly want to "apply" some function to an array in a way
that the arguments to the user function depend on the index on which the apply
is working.  A simple example is:
> 
> A <- array( runif(160), dim=c(5,4,8) ) x <- matrix( runif(32),
nrow=4,
> ncol=8 ) b <- runif(8)
> f1 <- function( A, x, b ) { sum( A %*% x ) + b } result <- rep(0.0,8)
> for (i in 1:8) {  result[i] <- f1( A[,,i], x[,i] , b[i] ) }
> 
> This works, but is slow.  I'd like to be able to do something like:
>    generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2),
list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and
b[i] to FUN in addition to A[,,i].
> 
> Does such a generalized.apply already exist somewhere?  While I can write a
C function to do a particular case, it would be nice if there was a fast,
general way to do this.
I would have thought that this would achieve the same result:

result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} )

Or: 

result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] ,
b[i] )} )

(I doubt it will be any faster, but if 'i' is large, parallelism might
help. The inner function appears to be fairly efficient.)
-- 

David Winsemius
Alameda, CA, USA

===================================================================================
Thanks for the response.  I gave a toy example with 8 iterations to illustrate
the point,  so I thought I would bump it up to make my point about speed.  But
to my surprise, using a 'for' loop is FASTER than using 'sapply'
as David suggest or even 'apply'  on a bit simpler problem.   Here is
the example:

n <- 800000; m <- 10; k <- 10
A <- array( 1:(m*n*k), dim=c(m,k,n) )
y <- matrix( 1:(k*n), nrow=k, ncol=n )
b <- 1:n
f1 <- function( A, y, b ) { sum( A %*% y ) + b }

# use a for loop
time1 <- system.time( {
result <- rep(0.0,n)
for (i in 1:n) {
  result[i] <- f1( A[,,i], y[,i] , b[i] )
}
result } )

#  use sapply
time2 <- system.time( result2 <- sapply( seq.int( dim(A)[3] ) ,
function(i) { f1( A[,,i], y[,i] , b[i] )} ))

# fix y and b, and use standard apply
time3 <- system.time( result3 <- apply( A, MARGIN=3, FUN=f1, y=y[,1],
b=b[1] ) )

# user times, then ratios of user times
c( time1[1], time2[1],time3[1]); c( time2[1]/time1[1], time3[1]/time1[1] )  
#   4.84      5.22      5.32 
#   1.078512  1.099174

So using a for loop saves 8-10% of the execution time as compared to sapply and
apply!?  Years ago I experimented and found out I could speed things up
noticeably by replacing loops with apply.  This is no longer the case, at least
in this simple experiment.  Is this a result of byte code?  Can someone tell us
when a for loop is going to be slower than using apply?  A more complicated loop
that computes multiple quantities?

John

Joshua Ulrich

2016-Dec-09 15:31 UTC

head link

[Rd] wish list: generalized apply

On Thu, Dec 8, 2016 at 3:59 PM, David Winsemius <dwinsemius at
comcast.net> wrote:>
>> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpnolan at
american.edu> wrote:
>>
>> Dear All,
>>
>> I regularly want to "apply" some function to an array in a
way that the arguments to the user function depend on the index on which the
apply is working.  A simple example is:
>>
>> A <- array( runif(160), dim=c(5,4,8) )
>> x <- matrix( runif(32), nrow=4, ncol=8 )
>> b <- runif(8)
>> f1 <- function( A, x, b ) { sum( A %*% x ) + b }
>> result <- rep(0.0,8)
>> for (i in 1:8) {
>>  result[i] <- f1( A[,,i], x[,i] , b[i] )
>> }
>>
>> This works, but is slow.  I'd like to be able to do something like:
>>    generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2),
list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and
b[i] to FUN in addition to A[,,i].
>>
>> Does such a generalized.apply already exist somewhere?  While I can
write a C function to do a particular case, it would be nice if there was a
fast, general way to do this.
>
> I would have thought that this would achieve the same result:
>
> result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i]
)} )
>
> Or:
>
> result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i]
, b[i] )} )
>
> (I doubt it will be any faster, but if 'i' is large, parallelism
might help. The inner function appears to be fairly efficient.)
You're right, it's slower.  Despite how often it's repeated that
"loops in R are slow", they're not *that* slow.  They're often
faster
than the *apply functions, especially if they have been "compiled" by
compiler::cmpfun().

You really need to know *why* code is slow before trying to make it
faster.  I profiled an example that would have a loop with 1e6
iterations and 80%+ of the time was still spent inside f1().

set.seed(21)
nc <- 1e6
nr <- 10
A <- array( runif(5*nr*nc), dim=c(5,nr,nc) )
x <- matrix( runif(nr*nc), nrow=nr, ncol=nc )
b <- runif(nc)
f1 <- compiler::cmpfun(function( A, x, b ) { sum( A %*% x ) + b })
f2 <- compiler::cmpfun({
  function(A, x, b, FUN) {
    result <- numeric(length(b))
    for (i in seq_along(b)) {
      result[i] <- FUN( A[,,i], x[,i] , b[i] )
    }
    return(result)
  }
})
Rprof(interval=0.01)
result <- f2(A,x,b,f1)
Rprof(NULL)
summaryRprof()

$by.self
      self.time self.pct total.time total.pct
"FUN"      4.29    84.28       4.76     93.52
"%*%"      0.47     9.23       0.47      9.23
"f2"       0.33     6.48       5.09    100.00

$by.total
      total.time total.pct self.time self.pct
"f2"        5.09    100.00      0.33     6.48
"FUN"       4.76     93.52      4.29    84.28
"%*%"       0.47      9.23      0.47     9.23

$sample.interval
[1] 0.01

$sampling.time
[1] 5.09

In this case, almost all the time is spent evaluating f1() ("FUN"),
even after calling compiler::cmpfun on f1() and on a function
containing the loop.  Making the looping construct faster is not going
to improve the performance of this code by a significant amount.
I.e., dropping to compiled code will only help if you avoid the R
function call, but then that's not a general solution...
> --
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com
R/Finance 2016 | www.rinfinance.com

Maybe Matching Threads

Search for more seemingly similar threads

R devel - Dec 2016 - wish list: generalized apply

[Rd] wish list: generalized apply

[Rd] wish list: generalized apply

[Rd] wish list: generalized apply

[Rd] wish list: generalized apply

Maybe Matching Threads