Ivan Krylov
2022-Feb-05 08:09 UTC
[R] Using .Fortran in R - how can I use file i/o flexibly OR arrays of variable length as arguments?
On Fri, 4 Feb 2022 10:21:11 +0000 Peter Green <mapjg at bristol.ac.uk> wrote:> the output I want to return to R is copious, and variable in length, > the length being computed within the Fortran code (it is a > variable-dimension simulation)Is there a way to separate the calculation of the output size from the rest of the subroutine? If yes, the simplest option would be to pass pre-allocated vectors to the .Fortran() call. Unfortunately, .Fortran() is not expressive enough for more complicated use cases. Memory that you intend to return to R must be allocated using its own allocator/garbage collector system, which (like any other call into R from user code) may longjmp() away from your functions on interrupt, error or allocation failure, abandoning any resources the garbage collector doesn't know about. In turn, this requires the use of C and the PROTECT() macro. In theory, you could use Fortran 2003 "C interoperability" to call C entry points, but I expect that to be very inconvenient (at least due to the lack of CPP macros). Since you mention CRAN checks, this might be a better question for <r-package-devel at r-project.org>, not R-help. -- Best regards, Ivan
Peter Green
2022-Feb-05 09:41 UTC
[R] Using .Fortran in R - how can I use file i/o flexibly OR arrays of variable length as arguments?
Thank you, Ivan, it's reassuring in a way to learn from your reply that I am not missing something very obvious. It's unfortunate that a computation that could be done cleanly, using dynamic allocation of memory, within either R or Fortran separately, cannot be split between the two in a way that is optimal for the capabilities of each language, because of the .Fortran bottleneck. I wondered whether you (or anyone) thought there was a way to do this using C wrapper routines in some way? Unfortunately, the ultimate size of the output is something the algorithm would only learn very late in the computation, so pre-calculating that size is not a practical option (else indeed I would do that within R before use of .Fortran). Any thought on the other route I suggested, that is, binary file i/o from with the Fortran code (e.g. full use of the connections functionality), again perhaps via C? Although I mentioned CRAN, that was really just to stress I wanted a platform-independent solution ideally. Peter Green On 05/02/2022 08:09, Ivan Krylov wrote:> On Fri, 4 Feb 2022 10:21:11 +0000 > Peter Green <mapjg at bristol.ac.uk> wrote: > >> the output I want to return to R is copious, and variable in length, >> the length being computed within the Fortran code (it is a >> variable-dimension simulation) > Is there a way to separate the calculation of the output size from the > rest of the subroutine? If yes, the simplest option would be to pass > pre-allocated vectors to the .Fortran() call. > > Unfortunately, .Fortran() is not expressive enough for more complicated > use cases. Memory that you intend to return to R must be allocated > using its own allocator/garbage collector system, which (like any other > call into R from user code) may longjmp() away from your functions on > interrupt, error or allocation failure, abandoning any resources the > garbage collector doesn't know about. In turn, this requires the use of > C and the PROTECT() macro. In theory, you could use Fortran 2003 "C > interoperability" to call C entry points, but I expect that to be very > inconvenient (at least due to the lack of CPP macros). > > Since you mention CRAN checks, this might be a better question for > <r-package-devel at r-project.org>, not R-help. >
Ivan Krylov
2022-Feb-05 11:38 UTC
[R] Using .Fortran in R - how can I use file i/o flexibly OR arrays of variable length as arguments?
On Sat, 5 Feb 2022 09:41:02 +0000 Peter Green <mapjg at bristol.ac.uk> wrote:> Any thought on the other route I suggested, that is, binary file i/o > from with the Fortran code (e.g. full use of the connections > functionality), again perhaps via C?I think that Fortran I/O is only forbidden when it comes to units * and 6. It should be allowed to open a unit, use unformatted I/O to write an array into it and readBin() back on the R side. On the other hand, the use of connections means that we're calling into R, which means that we have to interact with the garbage collector. It's easier to just return the values to R in that case.> I wondered whether you (or anyone) thought there was a way to do this > using C wrapper routines in some way?I've never done this myself, but I think that the following should work: /* * We'll need a single pointer to everything returned by the function, * so we'll store the outputs in a struct. If the function returns only * one allocated pointer, we don't need this structure. */ struct calculation_result { double *a, *b, *c; // ... }; /* See WRE 5.13 */ static void free_calculation_result(SEXP * ptr) { struct calculation_result * res = R_ExternalPtrAddr(ptr); if (!res) return; /* * Call the Fortran function to deallocate a, b, c here, using C * interoperability or the F77_CALL macro. */ Free(res); R_ClearExternalPtr(ptr); } /* See WRE 5.10.1 */ SEXP run_calculation(SEXP arg1, SEXP arg2, ...) { /* * Allocate the result store and make it known to R, including how to * dispose of it. */ SEXP sres = PROTECT(R_MakeExternalPtr(NULL, R_NilValue, R_NilValue)); R_RegisterCFinalizerEx(sres, free_calculation_result, TRUE); /* NOTE: if Calloc fails, it longjmp()s instead of returning NULL */ struct calculation_result * res = Calloc(1, struct calculation_result); R_SetExternalPtrAddr(sres, res); /* * Call the Fortran function to perform the calculation here, store the * pointers it returns inside the res structure. */ /* * It is now safe to call into R again. If an error happens or we're * interrupted, free_calculation_result will prevent a memory leak. * Allocate a result object (e.g. a named list) and store the data from * `res` in R vectors. */ UNPROTECT(as many as PROTECT() calls above); return the_result_list; } Alternatively, provide some place with a static lifetime for the pointers to the calculation results (put them in a COMMON block with the SAVE attribute?) and split the code into three functions: 1. Perform the calculations and remember the pointers in variables with static lifetime; return the sizes of the arrays. 2. Copy the data from the COMMON pointers to the provided arrays. 3. Deallocate the Fortran-owned COMMON arrays. Then the R wrapper should do the following: on.exit(.Fortran('free_fortran_arrays')) sizes <- .Fortran('perform_calculations', sizes = integer(k))$sizes result <- .Fortran( 'copy_fortran_arrays', a = double(sizes[1]), b = double(sizes[2]), ... ) If R is interrupted or has an allocation failure between perform_calculations and copy_fortran_arrays, the destructor still runs and prevents resource leaks. This makes the code non-reentrant, but so is R itself. A third solution would be to abstract the Fortran code behind a C++ class and use the magic of Rcpp to return the data as R objects while keeping the promise that the destructor eventually runs. This is more or less the same as the first solution, but most of the work is done by Rcpp. Now that we both had to mention Writing R Extensions, I think we're firmly in the <r-package-devel at r-project.org> territory. -- Best regards, Ivan