iuke-tier@ey m@iii@g oii uiow@@edu
2024-Jun-07 14:30 UTC
[Rd] [External] Re: clarifying and adjusting the C API for R
On Fri, 7 Jun 2024, Steven Dirkse wrote:> You don't often get email from sdirkse at gams.com. Learn why this is important > Thanks for sharing this overview of an interesting and much-needed project. > You mention that R exports about 1500 symbols (on platforms supporting > visibility) but this subject isn't mentioned explicitly again in your note, > so I'm wondering how things?tie together.? Un-exported symbols cannot be > part of the API - how would people use them in this case?? In a perfect > world the set of exported symbols could define the API or match it exactly, > but I guess that isn't the case at present.? So I conclude that R exports > extra (i.e. non-API) symbols.? Is part of the goal to remove these extra > exports?No. We'll hide what we can, but base packages for one need access to some entry points that should not be in the API, so those have to stay un-hidden. Best, luke> > -Steve > > On Thu, Jun 6, 2024 at 10:47?AM luke-tierney--- via R-devel > <r-devel at r-project.org> wrote: > This is an update on some current work on the C API for use in R > extensions. > > The internal R implementation makes use of tens of thousands of > C > entry points. On Linux and Windows, which support visibility > restrictions, most of these are visible only within the R > executble or > shared library. About 1500 are not hidden and are visible to > dynamically loaded shared libraries, such as ones in packages, > and to > embedding applications. > > There are two main reasons for limiting access to entry points > in a > software framework: > > - Some entry points are very easy to use in ways that corrupt > internal > ? ?data, leading to segfaults or, worse, incorrect computations > without > ? ?segfaults. > > - Some entry point expose internal structure and other > implementation > ? ?details, which makes it hard to make improvements without > breaking > ? ?client code that has come to depend on these details. > > The API of C entry points that can be used in R extensions, both > for > packages and embedding, has evolved organically over many years. > The > definition for the current release expressed in the Writing R > Extensions manual (WRE) is roughly: > > ? ? ?An entry point can be used if (1) it is declared in a > header file > ? ? ?in R.home("include"), and (2) if it is documented for use > in WRE. > > Ideally, (1) would be necessary and sufficient, but for a > variety of > reasons that isn't achievable, at least not in the near term. > (2) can > be challenging to determine; in particular, it is not amenable > to a > computational answer. > > An experimental effort is underway to add annotations to the WRE > Texinfo source to allow (2) to be answered unambiguously. The > annotations so far mostly reflect my reading or WRE and may be > revised > as they are reviewed by others. The annotated document can be > used for > programmatically identifying what is currently considered part > of the C > API. The result so far is an experimental function > tools:::funAPI(): > > ? ? ?> head(tools:::funAPI()) > ? ? ? ? ? ? ? ? ? ? ?name? ? ? ? ? ? ? ? ? ? loc apitype > ? ? ?1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h? ? eapi > ? ? ?2? ? ? ? alloc3DArray? ? ? ? ? ? ? ? ? ? WRE? ? ?api > ? ? ?3? ? ? ? ? allocArray? ? ? ? ? ? ? ? ? ? WRE? ? ?api > ? ? ?4? ? ? ? ? ?allocLang? ? ? ? ? ? ? ? ? ? WRE? ? ?api > ? ? ?5? ? ? ? ? ?allocList? ? ? ? ? ? ? ? ? ? WRE? ? ?api > ? ? ?6? ? ? ? ?allocMatrix? ? ? ? ? ? ? ? ? ? WRE? ? ?api > > The 'apitype' field has three possible levels > > ? ? ?| api? | stable (ideally) API | > ? ? ?| eapi | experimental API? ? ?| > ? ? ?| emb? | embedding API? ? ? ? | > > Entry points in the embedded API would typically only be used in > applications embedding R or providing new front ends, but might > be > reasonable to use in packages that support embedding. > > The 'loc' field indicates how the entry point is identified as > part of > an API: explicit mention in WRE, or declaration in a header file > identified as fully part of an API. > > [tools:::funAPI() may not be completely accurate as it relies on > regular expressions for examining header files considered part > of the > API rather than proper parsing. But it seems to be pretty close > to > what can be achieved with proper parsing.? Proper parsing would > add > dependencies on additional tools, which I would like to avoid > for > now. One dependency already present is that a C compiler has to > be on > the search path and cc -E has to run the C pre-processor.] > > Two additional experimental functions are available for > analyzing > package compliance: tools:::checkPkgAPI and > tools:::checkAllPkgsAPI. > These examine installed packages. > > [These may produce some false positives on macOS; they may or > may not > work on Windows at this point.] > > Using these tools initially showed around 200 non-API entry > points > used across packages on CRAN and BIOC. Ideally this number > should be > reduced to zero. This will require a combination of additions to > the > API and changes in packages. > > Some entry points can safely be added to the API. Around 40 have > already been added to WRE with API annotations; another 40 or so > can > probably be added after review. > > The remainder mostly fall into two groups: > > - Entry points that should never be used in packages, such as > ? ?SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for > that > ? ?matter) that can create inconsistent or corrupt internal > state. > > - Entry points that depend on the existence of internal > structure that > ? ?might be subject to change, such as the existence of promise > objects > ? ?or internal structure of environments. > > Many, if not most, of these seem to be used in idioms that can > either > be accomplished with existing higher-level functions already in > the > API, or by new higher level functions that can be created and > added. Working through these will take some time and > coordination > between R-core and maintainers of affected packages. > > Once things have gelled a bit more I hope to turn this into a > blog > post that will include some examples of moving non-API entry > point > uses into compliance. > > Best, > > luke > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa? ? ? ? ? ? ? ? ? Phone:? ? ? ? ? ? > ?319-335-3386 > Department of Statistics and? ? ? ? Fax:? ? ? ? ? ? ? > ?319-335-3017 > ? ? Actuarial Science > 241 Schaeffer Hall? ? ? ? ? ? ? ? ? email:? > ?luke-tierney at uiowa.edu > Iowa City, IA 52242? ? ? ? ? ? ? ? ?WWW:? > http://www.stat.uiowa.edu > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Reed A. Cartwright
2024-Jun-08 00:06 UTC
[Rd] [External] Re: clarifying and adjusting the C API for R
Would it be reasonable to move the non-API stuff that cannot be hidden into header files inside a "details" directory (or some other specific naming scheme)? That's what I use when I need to separate a public API from an internal API. On Fri, Jun 7, 2024 at 7:30?AM luke-tierney--- via R-devel <r-devel at r-project.org> wrote:> > On Fri, 7 Jun 2024, Steven Dirkse wrote: > > > You don't often get email from sdirkse at gams.com. Learn why this is important > > Thanks for sharing this overview of an interesting and much-needed project. > > You mention that R exports about 1500 symbols (on platforms supporting > > visibility) but this subject isn't mentioned explicitly again in your note, > > so I'm wondering how things tie together. Un-exported symbols cannot be > > part of the API - how would people use them in this case? In a perfect > > world the set of exported symbols could define the API or match it exactly, > > but I guess that isn't the case at present. So I conclude that R exports > > extra (i.e. non-API) symbols. Is part of the goal to remove these extra > > exports? > > No. We'll hide what we can, but base packages for one need access to > some entry points that should not be in the API, so those have to stay > un-hidden. > > Best, > > luke > > > > > -Steve > > > > On Thu, Jun 6, 2024 at 10:47?AM luke-tierney--- via R-devel > > <r-devel at r-project.org> wrote: > > This is an update on some current work on the C API for use in R > > extensions. > > > > The internal R implementation makes use of tens of thousands of > > C > > entry points. On Linux and Windows, which support visibility > > restrictions, most of these are visible only within the R > > executble or > > shared library. About 1500 are not hidden and are visible to > > dynamically loaded shared libraries, such as ones in packages, > > and to > > embedding applications. > > > > There are two main reasons for limiting access to entry points > > in a > > software framework: > > > > - Some entry points are very easy to use in ways that corrupt > > internal > > data, leading to segfaults or, worse, incorrect computations > > without > > segfaults. > > > > - Some entry point expose internal structure and other > > implementation > > details, which makes it hard to make improvements without > > breaking > > client code that has come to depend on these details. > > > > The API of C entry points that can be used in R extensions, both > > for > > packages and embedding, has evolved organically over many years. > > The > > definition for the current release expressed in the Writing R > > Extensions manual (WRE) is roughly: > > > > An entry point can be used if (1) it is declared in a > > header file > > in R.home("include"), and (2) if it is documented for use > > in WRE. > > > > Ideally, (1) would be necessary and sufficient, but for a > > variety of > > reasons that isn't achievable, at least not in the near term. > > (2) can > > be challenging to determine; in particular, it is not amenable > > to a > > computational answer. > > > > An experimental effort is underway to add annotations to the WRE > > Texinfo source to allow (2) to be answered unambiguously. The > > annotations so far mostly reflect my reading or WRE and may be > > revised > > as they are reviewed by others. The annotated document can be > > used for > > programmatically identifying what is currently considered part > > of the C > > API. The result so far is an experimental function > > tools:::funAPI(): > > > > > head(tools:::funAPI()) > > name loc apitype > > 1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h eapi > > 2 alloc3DArray WRE api > > 3 allocArray WRE api > > 4 allocLang WRE api > > 5 allocList WRE api > > 6 allocMatrix WRE api > > > > The 'apitype' field has three possible levels > > > > | api | stable (ideally) API | > > | eapi | experimental API | > > | emb | embedding API | > > > > Entry points in the embedded API would typically only be used in > > applications embedding R or providing new front ends, but might > > be > > reasonable to use in packages that support embedding. > > > > The 'loc' field indicates how the entry point is identified as > > part of > > an API: explicit mention in WRE, or declaration in a header file > > identified as fully part of an API. > > > > [tools:::funAPI() may not be completely accurate as it relies on > > regular expressions for examining header files considered part > > of the > > API rather than proper parsing. But it seems to be pretty close > > to > > what can be achieved with proper parsing. Proper parsing would > > add > > dependencies on additional tools, which I would like to avoid > > for > > now. One dependency already present is that a C compiler has to > > be on > > the search path and cc -E has to run the C pre-processor.] > > > > Two additional experimental functions are available for > > analyzing > > package compliance: tools:::checkPkgAPI and > > tools:::checkAllPkgsAPI. > > These examine installed packages. > > > > [These may produce some false positives on macOS; they may or > > may not > > work on Windows at this point.] > > > > Using these tools initially showed around 200 non-API entry > > points > > used across packages on CRAN and BIOC. Ideally this number > > should be > > reduced to zero. This will require a combination of additions to > > the > > API and changes in packages. > > > > Some entry points can safely be added to the API. Around 40 have > > already been added to WRE with API annotations; another 40 or so > > can > > probably be added after review. > > > > The remainder mostly fall into two groups: > > > > - Entry points that should never be used in packages, such as > > SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for > > that > > matter) that can create inconsistent or corrupt internal > > state. > > > > - Entry points that depend on the existence of internal > > structure that > > might be subject to change, such as the existence of promise > > objects > > or internal structure of environments. > > > > Many, if not most, of these seem to be used in idioms that can > > either > > be accomplished with existing higher-level functions already in > > the > > API, or by new higher level functions that can be created and > > added. Working through these will take some time and > > coordination > > between R-core and maintainers of affected packages. > > > > Once things have gelled a bit more I hope to turn this into a > > blog > > post that will include some examples of moving non-API entry > > point > > uses into compliance. > > > > Best, > > > > luke > > > > -- > > Luke Tierney > > Ralph E. Wareham Professor of Mathematical Sciences > > University of Iowa Phone: > > 319-335-3386 > > Department of Statistics and Fax: > > 319-335-3017 > > Actuarial Science > > 241 Schaeffer Hall email: > > luke-tierney at uiowa.edu > > Iowa City, IA 52242 WWW: > > https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$ > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$ > > > > > > > > > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > Iowa City, IA 52242 WWW: https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$ > ______________________________________________ > R-devel at r-project.org mailing list > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$