First, thank you to Tomas for writing his recent post[0] on the R developer blog. It raised important issues in interfacing R's C API and C++ code. However I do _not_ think the conclusion reached in the post is helpful > don?t use C++ to interface with R There are now more than 1,600 packages on CRAN using C++, the time is long past when that type of warning is going to be useful to the R community. These same issues will also occur with any newer language (such as Rust or Julia[1]) which uses RAII to manage resources and tries to interface with R. It doesn't seem a productive way forward for R to say it can't interface with these languages without first doing expensive copies into an intermediate heap. The advice to avoid C++ is also antithetical to John Chambers vision of first S and R as a interface language (from Extending R [2]) > The *interface* principle has always been central to R and to S before. An interface to subroutines was _the_ way to extend the first version of S. Subroutine interfaces have continued to be central to R. The book also has extensive sections on both C++ (via Rcpp) and Julia, so clearly John thinks these are legitimate ways to extend R. So if 'don't use C++' is not realistic and the current R API does not allow safe use of C++ exceptions what are the alternatives? One thing we could do is look how this is handled in other languages written in C which also use longjmp for errors. Lua is one example, they provide an alternative interface; lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return an error code rather long jumping. These interfaces can then be safely wrapped by RAII - exception based languages. This alternative error code interface is not just useful for C++, but also for resource cleanup in C, it is currently non-trivial to handle cleanup in all the possible cases a longjmp can occur (interrupts, warnings, custom conditions, timeouts any allocation etc.) even with R finalizers. It is past time for R to consider a non-jumpy C interface, so it can continue to be used as an effective interface to programming routines in the years to come. [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/ [1]: https://github.com/JuliaLang/julia/issues/28606 [2]: https://doi.org/10.1201/9781315381305 [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall
Jim, I think the main point of Tomas' post was to alert R users to the fact that there are very serious issues that you have to understand when interfacing R from C++. Using C++ code from R is fine, in many cases you only want to access R data, use some library or compute in C++ and return results. Such use-cases are completely fine in C++ as they don't need to trigger the issues mentioned and it should be made clear that it was not what Tomas' blog was about. I agree with Tomas that it is safer to give an advice to not use C++ to call R API since C++ may give a false impression that you don't need to know what you're doing. Note that it is possible to avoid longjmps by using R_ExecWithCleanup() which can catch any longjmps from the called function. So if you know what you're doing you can make things work. I think the issue here is not necessarily lack of tools, it is lack of knowledge - which is why I think Tomas' post is so important. Cheers, Simon> On Mar 29, 2019, at 11:19 AM, Jim Hester <james.f.hester at gmail.com> wrote: > > First, thank you to Tomas for writing his recent post[0] on the R > developer blog. It raised important issues in interfacing R's C API > and C++ code. > > However I do _not_ think the conclusion reached in the post is helpful >> don?t use C++ to interface with R > > There are now more than 1,600 packages on CRAN using C++, the time is > long past when that type of warning is going to be useful to the R > community. > > These same issues will also occur with any newer language (such as > Rust or Julia[1]) which uses RAII to manage resources and tries to > interface with R. It doesn't seem a productive way forward for R to > say it can't interface with these languages without first doing > expensive copies into an intermediate heap. > > The advice to avoid C++ is also antithetical to John Chambers vision > of first S and R as a interface language (from Extending R [2]) > >> The *interface* principle has always been central to R and to S > before. An interface to subroutines was _the_ way to extend the first > version of S. Subroutine interfaces have continued to be central to R. > > The book also has extensive sections on both C++ (via Rcpp) and Julia, > so clearly John thinks these are legitimate ways to extend R. > > So if 'don't use C++' is not realistic and the current R API does not > allow safe use of C++ exceptions what are the alternatives? > > One thing we could do is look how this is handled in other languages > written in C which also use longjmp for errors. > > Lua is one example, they provide an alternative interface; > lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return > an error code rather long jumping. These interfaces can then be safely > wrapped by RAII - exception based languages. > > This alternative error code interface is not just useful for C++, but > also for resource cleanup in C, it is currently non-trivial to handle > cleanup in all the possible cases a longjmp can occur (interrupts, > warnings, custom conditions, timeouts any allocation etc.) even with R > finalizers. > > It is past time for R to consider a non-jumpy C interface, so it can > continue to be used as an effective interface to programming routines > in the years to come. > > [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/ > [1]: https://github.com/JuliaLang/julia/issues/28606 > [2]: https://doi.org/10.1201/9781315381305 > [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall > [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
I think it's also worth saying that some of these issues affect C code as well; e.g. this is not safe: FILE* f = fopen(...); Rf_eval(...); fclose(f); whereas the C++ equivalent would likely handle closing of the file in the destructor. In other words, I think many users just may not be cognizant of the fact that most R APIs can longjmp, and what that implies for cleanup of allocated resources. R_alloc() may help solve the issue specifically for memory allocations, but for any library interface that has a 'open' and 'close' step, the same sort of issue will arise. What I believe we should do, and what Rcpp has made steps towards, is make it possible to interact with some subset of the R API safely from C++ contexts. This has always been possible with e.g. R_ToplevelExec() and R_ExecWithCleanup(), and now things are even better with R_UnwindProtect(). In theory, as a prototype, an R package could provide a 'safe' C++ interface to the R API using R_UnwindProtect() and friends as appropriate, and client packages could import and link to that package to gain access to the interface. Code generators (as Rcpp Attributes does) can handle some of the pain in these interfaces, so that users are mostly insulated from the nitty gritty details. I agree that the content of Tomas's post is very helpful, especially since I expect many R programmers who dip their toes into the C++ world are not aware of the caveats of talking to R from C++. However, I don't think it's helpful to recommend "don't use C++"; rather, I believe the question should be, "what can we do to make it possible to easily and safely interact with R from C++?". Because, as I understand it, all of the problems raised are solvable: either through a well-defined C++ interface, or through better education. I'll add my own opinion: writing correct C code is an incredibly difficult task. C++, while obviously not perfect, makes things substantially easier with tools like RAII, the STL, smart pointers, and so on. And I strongly believe that C++ (with Rcpp) is still a better choice than C for new users who want to interface with R from compiled code. tl;dr: I (and I think most others) just wish the summary had a more positive outlook for the future of C++ with R. Best, Kevin On Fri, Mar 29, 2019 at 10:16 AM Simon Urbanek <simon.urbanek at r-project.org> wrote:> > Jim, > > I think the main point of Tomas' post was to alert R users to the fact that there are very serious issues that you have to understand when interfacing R from C++. Using C++ code from R is fine, in many cases you only want to access R data, use some library or compute in C++ and return results. Such use-cases are completely fine in C++ as they don't need to trigger the issues mentioned and it should be made clear that it was not what Tomas' blog was about. > > I agree with Tomas that it is safer to give an advice to not use C++ to call R API since C++ may give a false impression that you don't need to know what you're doing. Note that it is possible to avoid longjmps by using R_ExecWithCleanup() which can catch any longjmps from the called function. So if you know what you're doing you can make things work. I think the issue here is not necessarily lack of tools, it is lack of knowledge - which is why I think Tomas' post is so important. > > Cheers, > Simon > > > > On Mar 29, 2019, at 11:19 AM, Jim Hester <james.f.hester at gmail.com> wrote: > > > > First, thank you to Tomas for writing his recent post[0] on the R > > developer blog. It raised important issues in interfacing R's C API > > and C++ code. > > > > However I do _not_ think the conclusion reached in the post is helpful > >> don?t use C++ to interface with R > > > > There are now more than 1,600 packages on CRAN using C++, the time is > > long past when that type of warning is going to be useful to the R > > community. > > > > These same issues will also occur with any newer language (such as > > Rust or Julia[1]) which uses RAII to manage resources and tries to > > interface with R. It doesn't seem a productive way forward for R to > > say it can't interface with these languages without first doing > > expensive copies into an intermediate heap. > > > > The advice to avoid C++ is also antithetical to John Chambers vision > > of first S and R as a interface language (from Extending R [2]) > > > >> The *interface* principle has always been central to R and to S > > before. An interface to subroutines was _the_ way to extend the first > > version of S. Subroutine interfaces have continued to be central to R. > > > > The book also has extensive sections on both C++ (via Rcpp) and Julia, > > so clearly John thinks these are legitimate ways to extend R. > > > > So if 'don't use C++' is not realistic and the current R API does not > > allow safe use of C++ exceptions what are the alternatives? > > > > One thing we could do is look how this is handled in other languages > > written in C which also use longjmp for errors. > > > > Lua is one example, they provide an alternative interface; > > lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return > > an error code rather long jumping. These interfaces can then be safely > > wrapped by RAII - exception based languages. > > > > This alternative error code interface is not just useful for C++, but > > also for resource cleanup in C, it is currently non-trivial to handle > > cleanup in all the possible cases a longjmp can occur (interrupts, > > warnings, custom conditions, timeouts any allocation etc.) even with R > > finalizers. > > > > It is past time for R to consider a non-jumpy C interface, so it can > > continue to be used as an effective interface to programming routines > > in the years to come. > > > > [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/ > > [1]: https://github.com/JuliaLang/julia/issues/28606 > > [2]: https://doi.org/10.1201/9781315381305 > > [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall > > [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Jim (et al.), Comments inline (and assume any offense was unintended, these kinds of things can be tricky to talk about). On Fri, Mar 29, 2019 at 8:19 AM Jim Hester <james.f.hester at gmail.com> wrote:> First, thank you to Tomas for writing his recent post[0] on the R > developer blog. It raised important issues in interfacing R's C API > and C++ code. > > However I do _not_ think the conclusion reached in the post is helpful > > don?t use C++ to interface with R >I was a bit surprised a the the strength of this too but its understandable given the content/motivation of the post. My personal take away, without putting any words in Tomas' or R-core's mouths at all, is that the crux here is that using c++ in R packages safely is a LOT less trivial than people in the wider R community think it is, these days. Or rather, there are things you can do safely quite easily when using c++ in an R package, and things you can't, but that distincton a) isn't really on many people's radar, and b) isn't super trivial to identify at any given time, and c) depends on internal implementation details so isn't stable / safe to rely on across time anyway. There are a lot of reasons for a), and none of them, nor anything else I'm about to say, constitute criticisms of Rcpp or its developers. I've always thought that we as tool/software developers in this space should make things seem as easy and convenient to users as they can/intrinsically are, *but not easier*. I don't know how popular that second part I put in there is generally, but personally I think its true and pretty important not to leave off. I read Tomas' past as suggesting that as a community, without pointing fingers or laying any individual blame, have unintentionally crossed "as easy as it actually is/can be to do right" line when it comes to the impression we give to novice/journeyman package developers regarding using c++to interact with the R internals. I honestly claim little familiarity with c++ but it seems like Tomas is the relevant expert on both it and hard-core details about how aspects of the R internals work so if he tells us that that has happened, we should probably listen.> There are now more than 1,600 packages on CRAN using C++, the time is > long past when that type of warning is going to be useful to the R > community. >Here I disagree here pretty strongly. I think the warning is very useful - unless these issues were widely known before the post (my impression is that they weren't) - and ignoring its contents or encouraging others to do so as influential members of the R community would be irresponsible. I mean, the reality of the situation as it exists now is more or less (I'd assume a great deal 'more' than 'less', personally) what Tomas described, right? Furthermore, regardless of what changes may come in the future, it seems very unlikely any of them will be in this coming release (since grand feature freeze is like, today?) so we're talking a year out, at LEAST. Given that, this advice, or at least a more nuanced stance that gives the information from the post proper weight and is different from the prevailing sentiment now, basically has to be realistic in the short term. At the very least I think the post tells us that we need to be really careful as a community with the "you want speed throw some c++ in your package at it, you can learn how in a day and it's super easy and basically free" messaging. The reality is more nuanced than that, at best, even if ultimately in many situations that is a valid/reasonable approach.> > These same issues will also occur with any newer language (such as > Rust or Julia[1]) which uses RAII to manage resources and tries to > interface with R. It doesn't seem a productive way forward for R to > say it can't interface with these languages without first doing > expensive copies into an intermediate heap. > > The advice to avoid C++ is also antithetical to John Chambers vision > of first S and R as a interface language (from Extending R [2]) > > > The *interface* principle has always been central to R and to S > before. An interface to subroutines was _the_ way to extend the first > version of S. Subroutine interfaces have continued to be central to R. > > The book also has extensive sections on both C++ (via Rcpp) and Julia, > so clearly John thinks these are legitimate ways to extend R. > > So if 'don't use C++' is not realistic and the current R API does not > allow safe use of C++ exceptions what are the alternatives? >Again, nothing is going to change about this for a year*, at least *(AFAIK, not on R-core) so we have to make it at least somewhat realistic; perhaps not the blanket moratorium that Tomas advocated - though IMHO statements from R-core about what is safe/supported when operating in R arena should be granted *a lot *of weight - but certainly not the prevailing sentiment it was responding to, either. That is true even if we commit to also looking for ways to improve the situation in the longer term.> > One thing we could do is look how this is handled in other languages > written in C which also use longjmp for errors. > > Lua is one example, they provide an alternative interface; > lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return > an error code rather long jumping. These interfaces can then be safely > wrapped by RAII - exception based languages. >So there's the function that Simon mentioned, which would work at least for evaluating R code, though it doesn't necessarily help when you want to hit the C api directly I think. Because of ALTREP, a LOT of things can allocate, and thus error, now. That was necessary to get what we needed without an amount of work/refactoring that would have tanked the whole project (I think), but it is a thing.> > This alternative error code interface is not just useful for C++, but > also for resource cleanup in C, it is currently non-trivial to handle > cleanup in all the possible cases a longjmp can occur (interrupts, > warnings, custom conditions, timeouts any allocation etc.) even with R > finalizers. > > It is past time for R to consider a non-jumpy C interface, so it can > continue to be used as an effective interface to programming routines > in the years to come. >I mean I totally get this desire, and don't even disagree necessarily in principle, but that's a pretty easy thing to say, right? My impression, without really knowing the details of what all that would entail is that it would/will be a seriously non-trivial amount of work for a group of people who are already very busy maintaining an extremely widely used, extremely complex piece of software. Best, ~G> >[[alternative HTML version deleted]]
I appreciate the writing on this. However I definitely think there is a huge difference between "use with care" and "don't use". They just are not the same statement.> On Mar 29, 2019, at 10:15 AM, Simon Urbanek <simon.urbanek at R-project.org> wrote: > > Jim, > > I think the main point of Tomas' post was to alert R users to the fact that there are very serious issues that you have to understand when interfacing R from C++. Using C++ code from R is fine, in many cases you only want to access R data, use some library or compute in C++ and return results. Such use-cases are completely fine in C++ as they don't need to trigger the issues mentioned and it should be made clear that it was not what Tomas' blog was about. > > I agree with Tomas that it is safer to give an advice to not use C++ to call R API since C++ may give a false impression that you don't need to know what you're doing. Note that it is possible to avoid longjmps by using R_ExecWithCleanup() which can catch any longjmps from the called function. So if you know what you're doing you can make things work. I think the issue here is not necessarily lack of tools, it is lack of knowledge - which is why I think Tomas' post is so important. > > Cheers, > Simon > > >> On Mar 29, 2019, at 11:19 AM, Jim Hester <james.f.hester at gmail.com> wrote: >> >> First, thank you to Tomas for writing his recent post[0] on the R >> developer blog. It raised important issues in interfacing R's C API >> and C++ code. >> >> However I do _not_ think the conclusion reached in the post is helpful >>> don?t use C++ to interface with R >> >> There are now more than 1,600 packages on CRAN using C++, the time is >> long past when that type of warning is going to be useful to the R >> community. >> >> These same issues will also occur with any newer language (such as >> Rust or Julia[1]) which uses RAII to manage resources and tries to >> interface with R. It doesn't seem a productive way forward for R to >> say it can't interface with these languages without first doing >> expensive copies into an intermediate heap. >> >> The advice to avoid C++ is also antithetical to John Chambers vision >> of first S and R as a interface language (from Extending R [2]) >> >>> The *interface* principle has always been central to R and to S >> before. An interface to subroutines was _the_ way to extend the first >> version of S. Subroutine interfaces have continued to be central to R. >> >> The book also has extensive sections on both C++ (via Rcpp) and Julia, >> so clearly John thinks these are legitimate ways to extend R. >> >> So if 'don't use C++' is not realistic and the current R API does not >> allow safe use of C++ exceptions what are the alternatives? >> >> One thing we could do is look how this is handled in other languages >> written in C which also use longjmp for errors. >> >> Lua is one example, they provide an alternative interface; >> lua_pcall[3] and lua_cpcall[4] which wrap a normal lua call and return >> an error code rather long jumping. These interfaces can then be safely >> wrapped by RAII - exception based languages. >> >> This alternative error code interface is not just useful for C++, but >> also for resource cleanup in C, it is currently non-trivial to handle >> cleanup in all the possible cases a longjmp can occur (interrupts, >> warnings, custom conditions, timeouts any allocation etc.) even with R >> finalizers. >> >> It is past time for R to consider a non-jumpy C interface, so it can >> continue to be used as an effective interface to programming routines >> in the years to come. >> >> [0]: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/ >> [1]: https://github.com/JuliaLang/julia/issues/28606 >> [2]: https://doi.org/10.1201/9781315381305 >> [3]: http://www.lua.org/manual/5.1/manual.html#lua_pcall >> [4]: http://www.lua.org/manual/5.1/manual.html#lua_cpcall >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel--------------- John Mount http://www.win-vector.com/ <http://www.win-vector.com/> Our book: Practical Data Science with R http://www.manning.com/zumel/ <http://www.manning.com/zumel/> [[alternative HTML version deleted]]