Dear All, I regularly want to "apply" some function to an array in a way that the arguments to the user function depend on the index on which the apply is working. A simple example is: A <- array( runif(160), dim=c(5,4,8) ) x <- matrix( runif(32), nrow=4, ncol=8 ) b <- runif(8) f1 <- function( A, x, b ) { sum( A %*% x ) + b } result <- rep(0.0,8) for (i in 1:8) { result[i] <- f1( A[,,i], x[,i] , b[i] ) } This works, but is slow. I'd like to be able to do something like: generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2), list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and b[i] to FUN in addition to A[,,i]. Does such a generalized.apply already exist somewhere? While I can write a C function to do a particular case, it would be nice if there was a fast, general way to do this. John ............................................................................................ John P. Nolan Math/Stat Dept., American University Gray Hall, 4400 Massachusetts Ave, NW Washington, DC 20016-8050 Phone: 202-885-3140 E-mail: jpnolan at american.edu Web: http://fs2.american.edu/jpnolan/www/
> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpnolan at american.edu> wrote: > > Dear All, > > I regularly want to "apply" some function to an array in a way that the arguments to the user function depend on the index on which the apply is working. A simple example is: > > A <- array( runif(160), dim=c(5,4,8) ) > x <- matrix( runif(32), nrow=4, ncol=8 ) > b <- runif(8) > f1 <- function( A, x, b ) { sum( A %*% x ) + b } > result <- rep(0.0,8) > for (i in 1:8) { > result[i] <- f1( A[,,i], x[,i] , b[i] ) > } > > This works, but is slow. I'd like to be able to do something like: > generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2), list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and b[i] to FUN in addition to A[,,i]. > > Does such a generalized.apply already exist somewhere? While I can write a C function to do a particular case, it would be nice if there was a fast, general way to do this.I would have thought that this would achieve the same result: result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} ) Or: result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] , b[i] )} ) (I doubt it will be any faster, but if 'i' is large, parallelism might help. The inner function appears to be fairly efficient.) -- David Winsemius Alameda, CA, USA
-----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Thursday, December 8, 2016 4:59 PM To: John P. Nolan <jpnolan at american.edu> Cc: Charles C. Berry <R-devel at r-project.org> Subject: Re: [Rd] wish list: generalized apply> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpnolan at american.edu> wrote: > > Dear All, > > I regularly want to "apply" some function to an array in a way that the arguments to the user function depend on the index on which the apply is working. A simple example is: > > A <- array( runif(160), dim=c(5,4,8) ) x <- matrix( runif(32), nrow=4, > ncol=8 ) b <- runif(8) > f1 <- function( A, x, b ) { sum( A %*% x ) + b } result <- rep(0.0,8) > for (i in 1:8) { result[i] <- f1( A[,,i], x[,i] , b[i] ) } > > This works, but is slow. I'd like to be able to do something like: > generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2), list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and b[i] to FUN in addition to A[,,i]. > > Does such a generalized.apply already exist somewhere? While I can write a C function to do a particular case, it would be nice if there was a fast, general way to do this.I would have thought that this would achieve the same result: result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} ) Or: result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] , b[i] )} ) (I doubt it will be any faster, but if 'i' is large, parallelism might help. The inner function appears to be fairly efficient.) -- David Winsemius Alameda, CA, USA =================================================================================== Thanks for the response. I gave a toy example with 8 iterations to illustrate the point, so I thought I would bump it up to make my point about speed. But to my surprise, using a 'for' loop is FASTER than using 'sapply' as David suggest or even 'apply' on a bit simpler problem. Here is the example: n <- 800000; m <- 10; k <- 10 A <- array( 1:(m*n*k), dim=c(m,k,n) ) y <- matrix( 1:(k*n), nrow=k, ncol=n ) b <- 1:n f1 <- function( A, y, b ) { sum( A %*% y ) + b } # use a for loop time1 <- system.time( { result <- rep(0.0,n) for (i in 1:n) { result[i] <- f1( A[,,i], y[,i] , b[i] ) } result } ) # use sapply time2 <- system.time( result2 <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], y[,i] , b[i] )} )) # fix y and b, and use standard apply time3 <- system.time( result3 <- apply( A, MARGIN=3, FUN=f1, y=y[,1], b=b[1] ) ) # user times, then ratios of user times c( time1[1], time2[1],time3[1]); c( time2[1]/time1[1], time3[1]/time1[1] ) # 4.84 5.22 5.32 # 1.078512 1.099174 So using a for loop saves 8-10% of the execution time as compared to sapply and apply!? Years ago I experimented and found out I could speed things up noticeably by replacing loops with apply. This is no longer the case, at least in this simple experiment. Is this a result of byte code? Can someone tell us when a for loop is going to be slower than using apply? A more complicated loop that computes multiple quantities? John
On Thu, Dec 8, 2016 at 3:59 PM, David Winsemius <dwinsemius at comcast.net> wrote:> >> On Dec 8, 2016, at 12:09 PM, John P. Nolan <jpnolan at american.edu> wrote: >> >> Dear All, >> >> I regularly want to "apply" some function to an array in a way that the arguments to the user function depend on the index on which the apply is working. A simple example is: >> >> A <- array( runif(160), dim=c(5,4,8) ) >> x <- matrix( runif(32), nrow=4, ncol=8 ) >> b <- runif(8) >> f1 <- function( A, x, b ) { sum( A %*% x ) + b } >> result <- rep(0.0,8) >> for (i in 1:8) { >> result[i] <- f1( A[,,i], x[,i] , b[i] ) >> } >> >> This works, but is slow. I'd like to be able to do something like: >> generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2), list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and b[i] to FUN in addition to A[,,i]. >> >> Does such a generalized.apply already exist somewhere? While I can write a C function to do a particular case, it would be nice if there was a fast, general way to do this. > > I would have thought that this would achieve the same result: > > result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} ) > > Or: > > result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] , b[i] )} ) > > (I doubt it will be any faster, but if 'i' is large, parallelism might help. The inner function appears to be fairly efficient.)You're right, it's slower. Despite how often it's repeated that "loops in R are slow", they're not *that* slow. They're often faster than the *apply functions, especially if they have been "compiled" by compiler::cmpfun(). You really need to know *why* code is slow before trying to make it faster. I profiled an example that would have a loop with 1e6 iterations and 80%+ of the time was still spent inside f1(). set.seed(21) nc <- 1e6 nr <- 10 A <- array( runif(5*nr*nc), dim=c(5,nr,nc) ) x <- matrix( runif(nr*nc), nrow=nr, ncol=nc ) b <- runif(nc) f1 <- compiler::cmpfun(function( A, x, b ) { sum( A %*% x ) + b }) f2 <- compiler::cmpfun({ function(A, x, b, FUN) { result <- numeric(length(b)) for (i in seq_along(b)) { result[i] <- FUN( A[,,i], x[,i] , b[i] ) } return(result) } }) Rprof(interval=0.01) result <- f2(A,x,b,f1) Rprof(NULL) summaryRprof() $by.self self.time self.pct total.time total.pct "FUN" 4.29 84.28 4.76 93.52 "%*%" 0.47 9.23 0.47 9.23 "f2" 0.33 6.48 5.09 100.00 $by.total total.time total.pct self.time self.pct "f2" 5.09 100.00 0.33 6.48 "FUN" 4.76 93.52 4.29 84.28 "%*%" 0.47 9.23 0.47 9.23 $sample.interval [1] 0.01 $sampling.time [1] 5.09 In this case, almost all the time is spent evaluating f1() ("FUN"), even after calling compiler::cmpfun on f1() and on a function containing the loop. Making the looping construct faster is not going to improve the performance of this code by a significant amount. I.e., dropping to compiled code will only help if you avoid the R function call, but then that's not a general solution...> -- > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com R/Finance 2016 | www.rinfinance.com