Duncan Murdoch
2025-Dec-02 13:48 UTC
[Rd] Speed question: passing arguments vs environment
On 2025-12-02 7:41 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:> I have a complex likelihood function f() to maximize, with lots of arguments (some of which set up indexes for derivatives, for instance). > When using something like optim(), one can pass these arguments through via its ? arg, or could make the likelihood function f() live in the same environment as the main routine so they are found directly. Is there any advantage of one versus the other wrt speed? At the end of the day, f() may get called thousands of times in a Hamiltonian MCMC. > > Since R does not replicate arguments that are used in a read-only fashion, one might expect little to no penalty for having them on the call chain, unless the bookkeeping for copy-on-write is itself time consuming.By the way, a nice way to put the args in the environment of the objective function is to use local() or a builder, e.g. objective <- local({ arg1 <- 1 arg2 <- 2 arg3 <- 3 function(x) { # objective code here that can see arg1, arg2, arg3 } }) or makeObjective <- function(arg1, arg2, arg3) { force(arg1) # evaluate the promises force(arg2) force(arg3) function(x) { # objective code here that can see arg1, arg2, arg3 } } objective <- makeObjective(1,2,3) Duncan Murdoch
Duncan's suggestion to time things is important -- and would make a very useful short communication or blog! There are frequently differences of orders of magnitude in timing. I'll also suggest that it is worth some crude timings of different solvers. There is sufficient variation over problems that this won't decide definitively which solver is fastest, but you might eliminate one or two that are poor for your situation. Depending on numbers of parameters, I'd guess ncg or its predecessor Rcgmin will be relatively good. LBFGS variants can be good, but sometimes seem to toss up disasters. Most of these can be accessed with optimx package to save coding. By removing some checks and safeguards in optimx you could likely speed up things a bit too. If full optimum is not needed, some attention to early stopping might be worthwhile, but I've seen lots of silly mistakes made playing with tolerances, and if you go that route, choose a custom termination rule that fits your particular problem or you'll get rubbish. JN On 2025-12-02 08:48, Duncan Murdoch wrote:> On 2025-12-02 7:41 a.m., Therneau, Terry M., Ph.D. via R-devel wrote: >> I have a complex likelihood function f() to maximize, with lots of arguments (some of which set up indexes for >> derivatives, for instance). >> When using something like optim(), one can pass these arguments through via its ? arg, or could make the likelihood >> function f() live in the same environment as the main routine so they are found directly.??? Is there any advantage of >> one versus the other wrt speed???? At the end of the day, f() may get called thousands of times in a Hamiltonian MCMC. >> >> Since R does not replicate arguments that are used in a read-only fashion, one might expect little to no penalty for >> having them on the call chain, unless the bookkeeping for copy-on-write is itself time consuming. > > By the way, a nice way to put the args in the environment of the objective function is to use local() or a builder, e.g. > > objective <- local({ > ? arg1 <- 1 > ? arg2 <- 2 > ? arg3 <- 3 > ? function(x) { > ? # objective code here that can see arg1, arg2, arg3 > ? } > }) > > or > > makeObjective <- function(arg1, arg2, arg3) { > ? force(arg1) # evaluate the promises > ? force(arg2) > ? force(arg3) > > ? function(x) { > ??? # objective code here that can see arg1, arg2, arg3 > ? } > } > > objective <- makeObjective(1,2,3) > > Duncan Murdoch > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel