thr3ads.net - R devel - [Rd] Custom C finalizers for .Call [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Jeroen Ooms

2015-Nov-23 23:10 UTC

[Rd] Custom C finalizers for .Call

WRE explains that R_alloc() can be used to allocate memory which
automatically gets released by R at the end of a .C, .Call or
.External, even in the case of an error or interruption. This is a
really great feature to prevent memory leaks. I was wondering if there
is a way to extend this mechanism to allow for automatically running
UNPROTECT and custom finalizers at the end of a .Call as well.

Currently it is all to easy for package authors to introduce a memory
leak or stack imbalance by calling Rf_error() or
R_CheckUserInterrupt() in a way that skips over the usual cleanup
steps. This holds especially for packages interfacing C libraries
(libcurl, libxml2, openssl, etc) which require xx_new() and xx_free()
functions to allocate/free various types of objects, handles and
contexts. Therefore we cannot use R_alloc() and we need to manually
clean up when returning, which is tricky for irregular exits.

Moreover package authors might benefit from an alternative of
allocVector() which automatically protects the SEXP until the .Call is
done. Perhaps I don't fully appreciate the complexity of the garbage
collector, but one could imagine a variant of PROTECT() which
automatically keeps a counter 'n' for the number of allocated objects
and makes R run UNPROTECT(n) when .Call exists, along with releasing
the R_alloc() memory. Yes, there are cases where it is useful to have
manual control over what can be collected earlier during the .Call
procedure, but these are rare. A lot of C code in packages might
become safer and cleaner if authors would have an option to let this
be automated.

The most general feature would a hook for adding custom C functions to
the .Call exit, similar to on.exit() in R:

  xmlNodePtr *node =  xmlNewNode(...);
  Rf_on_exit(xmlFreeNode, node);
  EVP_PKEY_CTX *ctx = EVP_PKEY_CTX_new(...);
  Rf_on_exit(EVP_PKEY_CTX_free, ctx);
  SEXP out = PROTECT(allocVector(...));
  Rf_on_exit(UNPROTECT, 1);

I don't know R's internals well enough to estimate if something like
this would be possible. I did put together a simple C example of a
linked list with object pointers and their corresponding free
functions, which can easily be free'd with a single call:
http://git.io/vBqRA . So basically what is mostly missing at this
point is a way to trigger this at the end of the .Call in a way that
works for regular returns, errors and interruptions...

Andrew Piskorski

2015-Nov-24 23:20 UTC

head link

[Rd] Custom C finalizers for .Call

On Tue, Nov 24, 2015 at 12:10:12AM +0100, Jeroen Ooms wrote:
> Currently it is all to easy for package authors to introduce a memory
> leak or stack imbalance by calling Rf_error() or
> R_CheckUserInterrupt() in a way that skips over the usual cleanup
> steps.
I have a more modest request:  Please improve the documentation of
exactly what Rf_error() does and how it should be used!  E.g., does
calling it terminate execution of the C function from which it is
called, or not?  The docs below don't actually say:

  https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Error-handling
  https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Warnings-and-errors

The docs imply that these are "C-level equivalents" to the R stop()
function, which of course DOES terminate execution.  I believe
Rf_error() does also, but I really wish it was explicitly stated one
way or the other.

Grepping the R source, I never did find where Rf_error() is actually
implemented.  What am I missing?

In src/include/Rinternals.h, the implementation of error_return()
first calls Rf_error(msg) and then does a separate "return R_NilValue"
in the next statement.  So you might think that Rf_error() must NOT
terminate execution, otherwise why the extra return statement?  But it
also has a comment:

  /* return(.) NOT reached : for -Wall */

Meaning that that return statement is just to silence compiler
warnings, and is not supposed to be reached because Rf_error()
terminates execution.  But this is a ridiculously roundabout way to
infer what the behavior of Rf_error() is supposed to be...

-- 
Andrew Piskorski <atp at piskorski.com>

Duncan Murdoch

2015-Nov-24 23:58 UTC

head link

[Rd] Custom C finalizers for .Call

On 24/11/2015 6:20 PM, Andrew Piskorski wrote:> On Tue, Nov 24, 2015 at 12:10:12AM +0100, Jeroen Ooms wrote:
>
>> Currently it is all to easy for package authors to introduce a memory
>> leak or stack imbalance by calling Rf_error() or
>> R_CheckUserInterrupt() in a way that skips over the usual cleanup
>> steps.
>
> I have a more modest request:  Please improve the documentation of
> exactly what Rf_error() does and how it should be used!  E.g., does
> calling it terminate execution of the C function from which it is
> called, or not?  The docs below don't actually say:
>
>   
https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Error-handling
>   
https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Warnings-and-errors
>
> The docs imply that these are "C-level equivalents" to the R
stop()
> function, which of course DOES terminate execution.  I believe
> Rf_error() does also, but I really wish it was explicitly stated one
> way or the other.
I'd imagine the author of that section thought it *was* explicit.  The 
documentation can't answer every question, or clarify every 
misunderstanding.  That's why R-devel exists.  fortune("source")
is also relevant.  But now that you know the truth, you could suggest 
rewording, as an svn patch against the trunk version of R-exts.texi.
> Grepping the R source, I never did find where Rf_error() is actually
> implemented.  What am I missing?
You probably forgot to grep the .h files:

include\R_ext\Error.h(48): #define error Rf_error

The implementation of the function calls it error(), and uses the define 
to remap that to Rf_error().  You can find the implementation in (where 
else?) src/main/errors.c.
>
> In src/include/Rinternals.h, the implementation of error_return()
> first calls Rf_error(msg) and then does a separate "return
R_NilValue"
> in the next statement.  So you might think that Rf_error() must NOT
> terminate execution, otherwise why the extra return statement?  But it
> also has a comment:
>
>    /* return(.) NOT reached : for -Wall */
>
> Meaning that that return statement is just to silence compiler
> warnings, and is not supposed to be reached because Rf_error()
> terminates execution.  But this is a ridiculously roundabout way to
> infer what the behavior of Rf_error() is supposed to be...
I won't disagree that you took a roundabout route to the right conclusion.

Duncan Murdoch

Simon Urbanek

2015-Nov-25 15:53 UTC

head link

[Rd] Custom C finalizers for .Call

Jeroen,

I agree that it may be useful to have some kind of "finally"-like
infrastructure. However, in the use cases you list there are already ways to do
that - the same way that R_alloc uses. First, you don't need to call
UNPROTECT - the whole point is that the protection stack is automatically popped
in case of an abnormal termination. That is precisely how memory leaks are
prevented - as long as you play by the rules it will be released for you. The
check for UNPROTECTs at the end of .Call is explicitly there so catch bugs in
*normal* termination. So no memory leaks there.

The real case where users may create leaks is if you allocate memory that you
don't tell R about. As long as you associate a finalizer with any non-R
memory you allocate, there will be no memory leaks -that is how you are supposed
to write R packages with external allocations. So the only difference to your
example is that you don't register a finalizer with the *function* but
rather with the *allocation* you make. That also seems IMHO less error prone.

So to take your example, the way would typically write that code safely is
something like

typedef struct {
  xmlNodePtr *node;
 EVP_PKEY_CTX *ctx;
} my_context_t;

// define how to dispose of all things you care about correctly
static void context_fin(SEXP what) {
    my_context_t *c = (my_context_t*) EXTPTR_PTR(what);
    if (!c) return;
    if (c->ctx) EVP_PKEY_CTX_free(c->ctx);
    if (c->node) xmlFreeNode(c->node);
}

[...]
// allocate the context and tell R to manage its protection and finalization
// (you could write a macro to make this one-liner)
my_context_t* c = (my_context_t*) R_Calloc(1, my_context_t);
SEXP res = PROTECT(R_MakeExternalPtr(c, R_NilValue, R_NilValue));
R_RegisterCFinalizer(res, context_fin);

// do all work here ... you safely abort at any point without memory leaks
c->node =  xmlNewNode(...);
c->ctx = EVP_PKEY_CTX_new(...);
[...]

The point of using a finalizer is that no matter what happens the memory is
always released. The structure with all allocations is protected until you
unprotect it or there is any interrupt/error. Since all regular R rules apply,
you can also assign it someplace to make the protection dependent on any other
object you care about. This is often useful because you don't need to
PROTECT things left and right, but instead you can just have one object that
holds references to random things you care about.

Of course, you could write a wrapper for the above with some syntactic sugar to
achieve the same - essentially limiting the finalizer to be just a function call
on the reference that you create. It may be a bit of overkill since you may end
up creating objects for every allocation, but certainly doable. I would argue
that in most cases you already tend to have a structure for the things you
allocate so the "normal" approach is typically more clear and readable
than inlining calls with side-effects, but that may be a matter of taste.

Cheers,
Simon

On Nov 23, 2015, at 6:10 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu>
wrote:
> WRE explains that R_alloc() can be used to allocate memory which
> automatically gets released by R at the end of a .C, .Call or
> .External, even in the case of an error or interruption. This is a
> really great feature to prevent memory leaks. I was wondering if there
> is a way to extend this mechanism to allow for automatically running
> UNPROTECT and custom finalizers at the end of a .Call as well.
> 
> Currently it is all to easy for package authors to introduce a memory
> leak or stack imbalance by calling Rf_error() or
> R_CheckUserInterrupt() in a way that skips over the usual cleanup
> steps. This holds especially for packages interfacing C libraries
> (libcurl, libxml2, openssl, etc) which require xx_new() and xx_free()
> functions to allocate/free various types of objects, handles and
> contexts. Therefore we cannot use R_alloc() and we need to manually
> clean up when returning, which is tricky for irregular exits.
> 
> Moreover package authors might benefit from an alternative of
> allocVector() which automatically protects the SEXP until the .Call is
> done. Perhaps I don't fully appreciate the complexity of the garbage
> collector, but one could imagine a variant of PROTECT() which
> automatically keeps a counter 'n' for the number of allocated
objects
> and makes R run UNPROTECT(n) when .Call exists, along with releasing
> the R_alloc() memory. Yes, there are cases where it is useful to have
> manual control over what can be collected earlier during the .Call
> procedure, but these are rare. A lot of C code in packages might
> become safer and cleaner if authors would have an option to let this
> be automated.
> 
> The most general feature would a hook for adding custom C functions to
> the .Call exit, similar to on.exit() in R:
> 
>  xmlNodePtr *node =  xmlNewNode(...);
>  Rf_on_exit(xmlFreeNode, node);
>  EVP_PKEY_CTX *ctx = EVP_PKEY_CTX_new(...);
>  Rf_on_exit(EVP_PKEY_CTX_free, ctx);
>  SEXP out = PROTECT(allocVector(...));
>  Rf_on_exit(UNPROTECT, 1);
> 
> I don't know R's internals well enough to estimate if something
like
> this would be possible. I did put together a simple C example of a
> linked list with object pointers and their corresponding free
> functions, which can easily be free'd with a single call:
> http://git.io/vBqRA . So basically what is mostly missing at this
> point is a way to trigger this at the end of the .Call in a way that
> works for regular returns, errors and interruptions...
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

luke-tierney at uiowa.edu

2015-Nov-26 14:27 UTC

head link

[Rd] Custom C finalizers for .Call

We do have a mechanism in the base C code for ensuring that cleanup
code is run on longjmps; we wouldn't want to expose that in the API in
its current form, but could make a wrapper around it available -- I'll
make a note to look into it. But as Simon describes there are good
options available for packages now.

Best,

luke

On Wed, 25 Nov 2015, Simon Urbanek wrote:
> Jeroen,
>
> I agree that it may be useful to have some kind of "finally"-like
infrastructure. However, in the use cases you list there are already ways to do
that - the same way that R_alloc uses. First, you don't need to call
UNPROTECT - the whole point is that the protection stack is automatically popped
in case of an abnormal termination. That is precisely how memory leaks are
prevented - as long as you play by the rules it will be released for you. The
check for UNPROTECTs at the end of .Call is explicitly there so catch bugs in
*normal* termination. So no memory leaks there.
>
> The real case where users may create leaks is if you allocate memory that
you don't tell R about. As long as you associate a finalizer with any non-R
memory you allocate, there will be no memory leaks -that is how you are supposed
to write R packages with external allocations. So the only difference to your
example is that you don't register a finalizer with the *function* but
rather with the *allocation* you make. That also seems IMHO less error prone.
>
> So to take your example, the way would typically write that code safely is
something like
>
> typedef struct {
>  xmlNodePtr *node;
> EVP_PKEY_CTX *ctx;
> } my_context_t;
>
> // define how to dispose of all things you care about correctly
> static void context_fin(SEXP what) {
>    my_context_t *c = (my_context_t*) EXTPTR_PTR(what);
>    if (!c) return;
>    if (c->ctx) EVP_PKEY_CTX_free(c->ctx);
>    if (c->node) xmlFreeNode(c->node);
> }
>
> [...]
> // allocate the context and tell R to manage its protection and
finalization
> // (you could write a macro to make this one-liner)
> my_context_t* c = (my_context_t*) R_Calloc(1, my_context_t);
> SEXP res = PROTECT(R_MakeExternalPtr(c, R_NilValue, R_NilValue));
> R_RegisterCFinalizer(res, context_fin);
>
> // do all work here ... you safely abort at any point without memory leaks
> c->node =  xmlNewNode(...);
> c->ctx = EVP_PKEY_CTX_new(...);
> [...]
>
> The point of using a finalizer is that no matter what happens the memory is
always released. The structure with all allocations is protected until you
unprotect it or there is any interrupt/error. Since all regular R rules apply,
you can also assign it someplace to make the protection dependent on any other
object you care about. This is often useful because you don't need to
PROTECT things left and right, but instead you can just have one object that
holds references to random things you care about.
>
> Of course, you could write a wrapper for the above with some syntactic
sugar to achieve the same - essentially limiting the finalizer to be just a
function call on the reference that you create. It may be a bit of overkill
since you may end up creating objects for every allocation, but certainly
doable. I would argue that in most cases you already tend to have a structure
for the things you allocate so the "normal" approach is typically more
clear and readable than inlining calls with side-effects, but that may be a
matter of taste.
>
> Cheers,
> Simon
>
>
> On Nov 23, 2015, at 6:10 PM, Jeroen Ooms <jeroen.ooms at
stat.ucla.edu> wrote:
>
>> WRE explains that R_alloc() can be used to allocate memory which
>> automatically gets released by R at the end of a .C, .Call or
>> .External, even in the case of an error or interruption. This is a
>> really great feature to prevent memory leaks. I was wondering if there
>> is a way to extend this mechanism to allow for automatically running
>> UNPROTECT and custom finalizers at the end of a .Call as well.
>>
>> Currently it is all to easy for package authors to introduce a memory
>> leak or stack imbalance by calling Rf_error() or
>> R_CheckUserInterrupt() in a way that skips over the usual cleanup
>> steps. This holds especially for packages interfacing C libraries
>> (libcurl, libxml2, openssl, etc) which require xx_new() and xx_free()
>> functions to allocate/free various types of objects, handles and
>> contexts. Therefore we cannot use R_alloc() and we need to manually
>> clean up when returning, which is tricky for irregular exits.
>>
>> Moreover package authors might benefit from an alternative of
>> allocVector() which automatically protects the SEXP until the .Call is
>> done. Perhaps I don't fully appreciate the complexity of the
garbage
>> collector, but one could imagine a variant of PROTECT() which
>> automatically keeps a counter 'n' for the number of allocated
objects
>> and makes R run UNPROTECT(n) when .Call exists, along with releasing
>> the R_alloc() memory. Yes, there are cases where it is useful to have
>> manual control over what can be collected earlier during the .Call
>> procedure, but these are rare. A lot of C code in packages might
>> become safer and cleaner if authors would have an option to let this
>> be automated.
>>
>> The most general feature would a hook for adding custom C functions to
>> the .Call exit, similar to on.exit() in R:
>>
>>  xmlNodePtr *node =  xmlNewNode(...);
>>  Rf_on_exit(xmlFreeNode, node);
>>  EVP_PKEY_CTX *ctx = EVP_PKEY_CTX_new(...);
>>  Rf_on_exit(EVP_PKEY_CTX_free, ctx);
>>  SEXP out = PROTECT(allocVector(...));
>>  Rf_on_exit(UNPROTECT, 1);
>>
>> I don't know R's internals well enough to estimate if something
like
>> this would be possible. I did put together a simple C example of a
>> linked list with object pointers and their corresponding free
>> functions, which can easily be free'd with a single call:
>> http://git.io/vBqRA . So basically what is mostly missing at this
>> point is a way to trigger this at the end of the .Call in a way that
>> works for regular returns, errors and interruptions...
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Seemingly Similar Threads

Search for more reasonably related threads

R devel - Nov 2015 - Custom C finalizers for .Call

[Rd] Custom C finalizers for .Call

[Rd] Custom C finalizers for .Call

[Rd] Custom C finalizers for .Call

[Rd] Custom C finalizers for .Call

[Rd] Custom C finalizers for .Call

Seemingly Similar Threads