I have a likelihood I would like to compute using C++ and then optimize. It has data that need to persist across individual calls to the likelihood. I'd appreciate any recommendations about the best way to do this. There are several, related issues. 1. Use the R optimizer or a C optimizer? Because of the persistence problems (see below), using a C optimizer has a certain attraction. However, the C methods described in 5.8 of the "Writing R Extensions" include the caveat that "No function is provided for finite-differencing, nor for approximating the Hessian at the result." That's a big drawback, since I need that information. (Probably I will be doing this without analytic derivatives.) 2. How to persist the data? I think my preferred approach would be to pass data back to R (assuming the "optimize with R" approach above), and then pass it on to subsequent calls. The data would be the top of an object graph (i.e., there are pointers to disconnected chunks of memory) and it is not clear to me how to do this. First, the documentation doesn't indicate any "opaque" data type; should I use character (STRXP)? Second, I'm not sure how to protect it and the other chunks of memory. Does each one need to go inside a PROTECT call? And is it safe to have one invocation from R do PROTECT, and another much later one do UNPROTECT (all the examples I saw had both calls within the same function invocation). My hope is that if I allocate an object outside of R and don't tell R about it, R will never touch it. So I only need PROTECT for something going back to R. True? Also, the docs say not to protect too many items; there may be a lot. So I'd probably end up having to write my own alloc out of pools that were protected, and that's just another layer of junk in terms of the original problem. Another approach would be to just hang the data somewhere in the global space of the shared library. On general principles this is a poor approach ("don't use globals"), manifest in specific failings such as lack of thread safety. I also suspect the issues with getting that to work portably are probably considerable (as in, it may not be possible). P.S. The example of Zero-finding (4.9.1 in "Writing R Extensions") is, unfortunately, the reverse of this case. In the example, the function to be optimized is in R, while the optimizer is in C. -- Ross Boylan wk: (415) 502-4031 530 Parnassus Avenue (Library) rm 115-4 ross at biostat.ucsf.edu Dept of Epidemiology and Biostatistics fax: (415) 476-9856 University of California, San Francisco San Francisco, CA 94143-0840 hm: (415) 550-1062
Prof Brian Ripley
2003-Nov-04 21:12 UTC
[R] Architecting an optimization with external calls
Look into external pointers. That is how we have tackled this, e.g. in the ts package. On Tue, 4 Nov 2003, Ross Boylan wrote:> I have a likelihood I would like to compute using C++ and then > optimize. It has data that need to persist across individual calls to > the likelihood. I'd appreciate any recommendations about the best way > to do this. There are several, related issues. > > 1. Use the R optimizer or a C optimizer? > Because of the persistence problems (see below), using a C optimizer has > a certain attraction. However, the C methods described in 5.8 of the > "Writing R Extensions" include the caveat that "No function is provided > for finite-differencing, nor for approximating the Hessian at the > result." That's a big drawback, since I need that information. > (Probably I will be doing this without analytic derivatives.) > > 2. How to persist the data? > I think my preferred approach would be to pass data back to R (assuming > the "optimize with R" approach above), and then pass it on to subsequent > calls. The data would be the top of an object graph (i.e., there are > pointers to disconnected chunks of memory) and it is not clear to me how > to do this. First, the documentation doesn't indicate any "opaque" data > type; should I use character (STRXP)? Second, I'm not sure how to > protect it and the other chunks of memory. Does each one need to go > inside a PROTECT call? And is it safe to have one invocation from R do > PROTECT, and another much later one do UNPROTECT (all the examples I saw > had both calls within the same function invocation). > > My hope is that if I allocate an object outside of R and don't tell R > about it, R will never touch it. So I only need PROTECT for something > going back to R. True? > > Also, the docs say not to protect too many items; there may be a lot. > So I'd probably end up having to write my own alloc out of pools that > were protected, and that's just another layer of junk in terms of the > original problem. > > Another approach would be to just hang the data somewhere in the global > space of the shared library. On general principles this is a poor > approach ("don't use globals"), manifest in specific failings such as > lack of thread safety. I also suspect the issues with getting that to > work portably are probably considerable (as in, it may not be possible). > > P.S. The example of Zero-finding (4.9.1 in "Writing R Extensions") is, > unfortunately, the reverse of this case. In the example, the function > to be optimized is in R, while the optimizer is in C. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595