thr3ads.net - R devel - [Rd] some questions about R internal SEXP types [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Dan Kortschak

2020-Sep-06 01:44 UTC

[Rd] some questions about R internal SEXP types

Hello,

I am writing an R/Go interoperability tool[1] that work similarly to
Rcpp; the tool takes packages written in Go and performs the necessary
Go type analysis to wrap the Go code with C and R shims that allow the
Go code to then be called from R. The system is largely complete (with
the exception of having a clean approach to handling generalised
attributes in the easy case[2] - the less hand holding case does handle
these). Testing of some of the code is unfortunately lacking because of
the difficulties of testing across environments.

To make the system flexible I have provided an (intentionally
incomplete) Go API into the R internals which allows reasonably Go
type-safe interaction with SEXP values (Go does not have unions, so
this is uglier than it might be otherwise and unions are faked with Go
interface values). For efficiency reasons I've avoided using R internal
calls where possible (accessors are done with Go code directly, but
allocations are done in R's C code to avoid having to duplicate the
garbage collection mechanics in Go with the obvious risks of error and
possible behaviour skew in the future).

In doing this work I have some questions that I have not been able to
find answers for in the R-ints doc or hadley/r-internals.

   1. In R-ints, the LISTSXP SEXP type CDR is said to hold "usually"
      LISTSXP or NULL. What does the "usually" mean here? Is it
possible
      for the CDR to hold values other than LISTSXP or NULL, and is
      this?NULL NILSXP or C NULL? I assume that the CAR can hold any type
      of?SEXP, is this correct?
   2. The LANGSXP and DOTSXP types are lists, but the R-ints comments on
      them do not say whether the CDR of one of these lists is the same at
      the head of the list of devolves to a LISTSXP. Looking through the
      code suggests to me that functions that allocate these two types
      allocate a LISTSXP and then change only the head of the list to be
      the LANGSXP or DOTSXP that's required, meaning that the tail of the
      list is all LISTSXP. Is this correct?

The last question is more a question of interest in design strategy,
and the answer may have been lost to time. In order to reduce the need
to go through Go's interface assertions in a number of cases I have
decided to reinterpret R_NilValue to an untyped Go nil (this is
important for example in list traversal where the CDR can (hopefully)
be only one of two types LISTSXP or NILSXP; in Go this would require a
generalised SEXP return, but by doing this reinterpretation I can
return a *List pointer which may be nil, greatly simplifying the code
and improving the performance). My question her is why a singleton null
value was chosen to be represented as a fully allocated SEXP value
rather than just a C NULL. Also, whether C NULL is used to any great
extent within the internal code. Note that the Go API provides a
mechanism to easily reconvert the nil's used back to a R_NilValue when
returning from a Go function[3].

thanks
Dan Kortschak

[1]https://github.com/rgonomic/rgo
[2]https://github.com/rgonomic/rgo/issues/1
[3]https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export

Gabriel Becker

2020-Sep-07 21:38 UTC

head link

[Rd] some questions about R internal SEXP types

Dan,

Sounds like a cool project! Response to one of your questions inline

On Mon, Sep 7, 2020 at 4:24 AM Dan Kortschak via R-devel <
r-devel at r-project.org> wrote:
>
> The last question is more a question of interest in design strategy,
> and the answer may have been lost to time. In order to reduce the need
> to go through Go's interface assertions in a number of cases I have
> decided to reinterpret R_NilValue to an untyped Go nil (this is
> important for example in list traversal where the CDR can (hopefully)
> be only one of two types LISTSXP or NILSXP; in Go this would require a
> generalised SEXP return, but by doing this reinterpretation I can
> return a *List pointer which may be nil, greatly simplifying the code
> and improving the performance). My question her is why a singleton null
> value was chosen to be represented as a fully allocated SEXP value
> rather than just a C NULL. Also, whether C NULL is used to any great
> extent within the internal code.

I cannot speak to initial intent, perhaps others can. I can say that there
is at least one place where the difference between R_NilValue and NULL is
very important as of right now. The current design of the ALTREP framework
contract expects ALTREP methods that return a SEXP to return C NULL when
they fail (or decline) to do the requested computation and the
non-altclass-specific machinery should be run as a fallback. The places
where ALTREP methods are plugged into the existing, general internals then
check for C-NULL after attempting to fast-path the computation via ALTREP.
Any non-C-NULL SEXP, including R_Nilvalue will be taken as an indication
that the altrep-method succeeded and that SEXP is the resulting value,
causing the fall-back machinery to be skipped.

IIUC the system you described, this means that it would be impossible to
implement (a fully general) ALTREP class in GO using your framework (at
least for the method types that return SEXP and for which R_NilValue is a
valid return value) because your code is unable to distinguish safely
between the two. In practice in most currently existing methods, you
wouldn't ever need to return R_NilValue, I wouldn't think.

The problem that jumps out at me is Extract_subset. Now I'd need to do some
digging to be certain but there, for some types in some situations, it DOES
*seem* like you might need to return the R-NULL and find yourself unable to
do so.

Its also possible more methods will be added to the table in the future
that would be problematic in light of that restrictrion.

In particular, if ALTREP list/environment implementations were to ever be
supported I would expect you to be dead in the water entirely in terms of
building those as you'd find yourself entirely unable to implement the
Basic Single-element getter machinery, I think.

Beyond that, a quick grep of the sources tells me there are definitely a
few times SEXP objects are  tested with <var> == NULL though not
overwhelmingly many. Most such tests are for non-SEXP pointers.

Best,
~G

> Note that the Go API provides a
> mechanism to easily reconvert the nil's used back to a R_NilValue when
> returning from a Go function[3].
>
> thanks
> Dan Kortschak
>
> [1]https://github.com/rgonomic/rgo
> [2]https://github.com/rgonomic/rgo/issues/1
> [3]https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Bertram, Alexander

2020-Sep-08 07:33 UTC

head link

[Rd] some questions about R internal SEXP types

Hi Dan,

For what it's worth, Renjin requires LISTSXPs to hold either a LISTSXP or a
NULL, and this appears to be largely the case in practice based on running
tests for thousands of packages (including cross compiled C code). I can
only remember it being briefly an issue with the rlang package, but Lionel
graciously changed it:
https://github.com/r-lib/rlang/pull/579

Best,
Alex

On Mon, Sep 7, 2020 at 1:24 PM Dan Kortschak via R-devel <
r-devel at r-project.org> wrote:
>
> Hello,
>
> I am writing an R/Go interoperability tool[1] that work similarly to
> Rcpp; the tool takes packages written in Go and performs the necessary
> Go type analysis to wrap the Go code with C and R shims that allow the
> Go code to then be called from R. The system is largely complete (with
> the exception of having a clean approach to handling generalised
> attributes in the easy case[2] - the less hand holding case does handle
> these). Testing of some of the code is unfortunately lacking because of
> the difficulties of testing across environments.
>
> To make the system flexible I have provided an (intentionally
> incomplete) Go API into the R internals which allows reasonably Go
> type-safe interaction with SEXP values (Go does not have unions, so
> this is uglier than it might be otherwise and unions are faked with Go
> interface values). For efficiency reasons I've avoided using R internal
> calls where possible (accessors are done with Go code directly, but
> allocations are done in R's C code to avoid having to duplicate the
> garbage collection mechanics in Go with the obvious risks of error and
> possible behaviour skew in the future).
>
> In doing this work I have some questions that I have not been able to
> find answers for in the R-ints doc or hadley/r-internals.
>
>    1. In R-ints, the LISTSXP SEXP type CDR is said to hold
"usually"
>       LISTSXP or NULL. What does the "usually" mean here? Is it
possible
>       for the CDR to hold values other than LISTSXP or NULL, and is
>       this NULL NILSXP or C NULL? I assume that the CAR can hold any type
>       of SEXP, is this correct?
>    2. The LANGSXP and DOTSXP types are lists, but the R-ints comments on
>       them do not say whether the CDR of one of these lists is the same at
>       the head of the list of devolves to a LISTSXP. Looking through the
>       code suggests to me that functions that allocate these two types
>       allocate a LISTSXP and then change only the head of the list to be
>       the LANGSXP or DOTSXP that's required, meaning that the tail of
the
>       list is all LISTSXP. Is this correct?
>
> The last question is more a question of interest in design strategy,
> and the answer may have been lost to time. In order to reduce the need
> to go through Go's interface assertions in a number of cases I have
> decided to reinterpret R_NilValue to an untyped Go nil (this is
> important for example in list traversal where the CDR can (hopefully)
> be only one of two types LISTSXP or NILSXP; in Go this would require a
> generalised SEXP return, but by doing this reinterpretation I can
> return a *List pointer which may be nil, greatly simplifying the code
> and improving the performance). My question her is why a singleton null
> value was chosen to be represented as a fully allocated SEXP value
> rather than just a C NULL. Also, whether C NULL is used to any great
> extent within the internal code. Note that the Go API provides a
> mechanism to easily reconvert the nil's used back to a R_NilValue when
> returning from a Go function[3].
>
> thanks
> Dan Kortschak
>
> [1]https://github.com/rgonomic/rgo
> [2]https://github.com/rgonomic/rgo/issues/1
> [3]https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Alexander Bertram
Technical Director
*BeDataDriven BV*

Web: http://bedatadriven.com
Email: alex at bedatadriven.com
Tel. Nederlands: +31(0)647205388
Skype: akbertram

	[[alternative HTML version deleted]]

Tomas Kalibera

2020-Sep-08 09:07 UTC

head link

[Rd] some questions about R internal SEXP types

The general principle is that R packages are only allowed to use what is 
documented in the R help (? command) and in Writing R Extensions. The 
former covers what is allowed from R code in extensions, the latter 
mostly what is allowed from C code in extensions (with some references 
to Fortran).

If you are implementing a Go interface for writing R packages, such Go 
interface should thus only use what is in the R help and in Writing R 
Extensions. Otherwise, packages would not be able to use such interface.

What is described in R Internals is for understanding the internal 
structure of R implementation itself, so for development of R itself, it 
could help indeed also debugging of R itself and in some cases debugging 
or performance analysis of extensions. R Internals can help in giving an 
intuition, but when people are implementing R itself, they also need to 
check the code. R Internals does not describe any interface for external 
code, if it states any constraints about say pairlists, etc, take it as 
an intuition for what has been intended and probably holds or held at 
some level of abstraction, but you need to check the source code for the 
details, anyway (e.g., at some very low level CAR and CDR can be any 
SEXP or R_NilValue, locally in some functions even C NULL). Internally, 
some C code uses C NULL SEXPs, but it is rare and local, and again, only 
the interface described in Writing R Extensions is for external use.

WRE speaks about "R NULL", "R NULL object" or "C
NULL" in some cases to
avoid confusion, e.g. for values types as "void *". SEXPs that
packages
obtain using the interface in WRE should not be C NULL, only R NULL 
(R_NilValue). External pointers can become C NULL and this is documented 
in WRE 5.13.

Best
Tomas

On 9/6/20 3:44 AM, Dan Kortschak via R-devel wrote:> Hello,
>
> I am writing an R/Go interoperability tool[1] that work similarly to
> Rcpp; the tool takes packages written in Go and performs the necessary
> Go type analysis to wrap the Go code with C and R shims that allow the
> Go code to then be called from R. The system is largely complete (with
> the exception of having a clean approach to handling generalised
> attributes in the easy case[2] - the less hand holding case does handle
> these). Testing of some of the code is unfortunately lacking because of
> the difficulties of testing across environments.
>
> To make the system flexible I have provided an (intentionally
> incomplete) Go API into the R internals which allows reasonably Go
> type-safe interaction with SEXP values (Go does not have unions, so
> this is uglier than it might be otherwise and unions are faked with Go
> interface values). For efficiency reasons I've avoided using R internal
> calls where possible (accessors are done with Go code directly, but
> allocations are done in R's C code to avoid having to duplicate the
> garbage collection mechanics in Go with the obvious risks of error and
> possible behaviour skew in the future).
>
> In doing this work I have some questions that I have not been able to
> find answers for in the R-ints doc or hadley/r-internals.
>
>     1. In R-ints, the LISTSXP SEXP type CDR is said to hold
"usually"
>        LISTSXP or NULL. What does the "usually" mean here? Is it
possible
>        for the CDR to hold values other than LISTSXP or NULL, and is
>        this?NULL NILSXP or C NULL? I assume that the CAR can hold any type
>        of?SEXP, is this correct?
>     2. The LANGSXP and DOTSXP types are lists, but the R-ints comments on
>        them do not say whether the CDR of one of these lists is the same at
>        the head of the list of devolves to a LISTSXP. Looking through the
>        code suggests to me that functions that allocate these two types
>        allocate a LISTSXP and then change only the head of the list to be
>        the LANGSXP or DOTSXP that's required, meaning that the tail of
the
>        list is all LISTSXP. Is this correct?
>
> The last question is more a question of interest in design strategy,
> and the answer may have been lost to time. In order to reduce the need
> to go through Go's interface assertions in a number of cases I have
> decided to reinterpret R_NilValue to an untyped Go nil (this is
> important for example in list traversal where the CDR can (hopefully)
> be only one of two types LISTSXP or NILSXP; in Go this would require a
> generalised SEXP return, but by doing this reinterpretation I can
> return a *List pointer which may be nil, greatly simplifying the code
> and improving the performance). My question her is why a singleton null
> value was chosen to be represented as a fully allocated SEXP value
> rather than just a C NULL. Also, whether C NULL is used to any great
> extent within the internal code. Note that the Go API provides a
> mechanism to easily reconvert the nil's used back to a R_NilValue when
> returning from a Go function[3].
>
> thanks
> Dan Kortschak
>
> [1]https://github.com/rgonomic/rgo
> [2]https://github.com/rgonomic/rgo/issues/1
> [3]https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Dan Kortschak

2020-Sep-08 09:47 UTC

head link

[Rd] some questions about R internal SEXP types

Thanks, Tomas.

This is unfortunate. Calling between Go and C is not cheap; the gc
implementation of the Go compiler (as opposed to gccgo) uses different
calling conventions from C and there are checks to ensure that Go
allocated memory pointers do not leak into C code. For this reason I
wanted to avoid these if at all possible (I cannot for allocations
since I don't want to keep tracking changes in how R implements its GC
and allocation).

However, if SEXP type behaviour of the standard types, and how
attributes are handled are not highly mobile, I think that what I'm
doing will be OK - at worst the Go code will panic and result in an R
error. The necessary interface to R for allocations is only eight
functions[1].

Note that there is a lot in WRE that's beyond what I want rgo to be
able to do (calling in to R from Go for example). In fact, there's just
a lot in WRE (it's almost 3 times the length of the Go language spec
and memory model reference combined). The issues around weak references
and external pointers are not something that I want to deal with;
working with that kind of object is not idiomatic for Go (in fact
without using C.malloc, R external pointers from Go would be forbidden
by the Go runtime) and I would not expect that they are likely to be
used by people writing extensions for R in Go.

Dan

[1]


https://github.com/rgonomic/rgo/blob/2ce7717c85516bbfb94d0b5c7ef1d9749dd1f817/sexp/r_internal.go#L86-L118

On Tue, 2020-09-08 at 11:07 +0200, Tomas Kalibera wrote:> The general principle is that R packages are only allowed to use what
> is
> documented in the R help (? command) and in Writing R Extensions. The
> former covers what is allowed from R code in extensions, the latter
> mostly what is allowed from C code in extensions (with some
> references
> to Fortran).
> 
> If you are implementing a Go interface for writing R packages, such
> Go
> interface should thus only use what is in the R help and in Writing R
> Extensions. Otherwise, packages would not be able to use such
> interface.
> 
> What is described in R Internals is for understanding the internal
> structure of R implementation itself, so for development of R itself,
> it
> could help indeed also debugging of R itself and in some cases
> debugging
> or performance analysis of extensions. R Internals can help in giving
> an
> intuition, but when people are implementing R itself, they also need
> to
> check the code. R Internals does not describe any interface for
> external
> code, if it states any constraints about say pairlists, etc, take it
> as
> an intuition for what has been intended and probably holds or held at
> some level of abstraction, but you need to check the source code for
> the
> details, anyway (e.g., at some very low level CAR and CDR can be any
> SEXP or R_NilValue, locally in some functions even C NULL).
> Internally,
> some C code uses C NULL SEXPs, but it is rare and local, and again,
> only
> the interface described in Writing R Extensions is for external use.
> 
> WRE speaks about "R NULL", "R NULL object" or "C
NULL" in some cases
> to
> avoid confusion, e.g. for values types as "void *". SEXPs that
> packages
> obtain using the interface in WRE should not be C NULL, only R NULL
> (R_NilValue). External pointers can become C NULL and this is
> documented
> in WRE 5.13.
> 
> Best
> Tomas
> 
> On 9/6/20 3:44 AM, Dan Kortschak via R-devel wrote:
> > Hello,
> > 
> > I am writing an R/Go interoperability tool[1] that work similarly
> > to
> > Rcpp; the tool takes packages written in Go and performs the
> > necessary
> > Go type analysis to wrap the Go code with C and R shims that allow
> > the
> > Go code to then be called from R. The system is largely complete
> > (with
> > the exception of having a clean approach to handling generalised
> > attributes in the easy case[2] - the less hand holding case does
> > handle
> > these). Testing of some of the code is unfortunately lacking
> > because of
> > the difficulties of testing across environments.
> > 
> > To make the system flexible I have provided an (intentionally
> > incomplete) Go API into the R internals which allows reasonably Go
> > type-safe interaction with SEXP values (Go does not have unions, so
> > this is uglier than it might be otherwise and unions are faked with
> > Go
> > interface values). For efficiency reasons I've avoided using R
> > internal
> > calls where possible (accessors are done with Go code directly, but
> > allocations are done in R's C code to avoid having to duplicate
the
> > garbage collection mechanics in Go with the obvious risks of error
> > and
> > possible behaviour skew in the future).
> > 
> > In doing this work I have some questions that I have not been able
> > to
> > find answers for in the R-ints doc or hadley/r-internals.
> > 
> >     1. In R-ints, the LISTSXP SEXP type CDR is said to hold
> > "usually"
> >        LISTSXP or NULL. What does the "usually" mean here?
Is it
> > possible
> >        for the CDR to hold values other than LISTSXP or NULL, and
> > is
> >        this NULL NILSXP or C NULL? I assume that the CAR can hold
> > any type
> >        of SEXP, is this correct?
> >     2. The LANGSXP and DOTSXP types are lists, but the R-ints
> > comments on
> >        them do not say whether the CDR of one of these lists is the
> > same at
> >        the head of the list of devolves to a LISTSXP. Looking
> > through the
> >        code suggests to me that functions that allocate these two
> > types
> >        allocate a LISTSXP and then change only the head of the list
> > to be
> >        the LANGSXP or DOTSXP that's required, meaning that the
tail
> > of the
> >        list is all LISTSXP. Is this correct?
> > 
> > The last question is more a question of interest in design
> > strategy,
> > and the answer may have been lost to time. In order to reduce the
> > need
> > to go through Go's interface assertions in a number of cases I
have
> > decided to reinterpret R_NilValue to an untyped Go nil (this is
> > important for example in list traversal where the CDR can
> > (hopefully)
> > be only one of two types LISTSXP or NILSXP; in Go this would
> > require a
> > generalised SEXP return, but by doing this reinterpretation I can
> > return a *List pointer which may be nil, greatly simplifying the
> > code
> > and improving the performance). My question her is why a singleton
> > null
> > value was chosen to be represented as a fully allocated SEXP value
> > rather than just a C NULL. Also, whether C NULL is used to any
> > great
> > extent within the internal code. Note that the Go API provides a
> > mechanism to easily reconvert the nil's used back to a R_NilValue
> > when
> > returning from a Go function[3].
> > 
> > thanks
> > Dan Kortschak
> > 
> > [1]https://github.com/rgonomic/rgo
> > [2]https://github.com/rgonomic/rgo/issues/1
> > [3]
> > 
https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export> > 
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>

Dan Kortschak

2020-Sep-08 09:47 UTC

head link

[Rd] some questions about R internal SEXP types

Thanks, Tomas.

This is unfortunate. Calling between Go and C is not cheap; the gc
implementation of the Go compiler (as opposed to gccgo) uses different
calling conventions from C and there are checks to ensure that Go
allocated memory pointers do not leak into C code. For this reason I
wanted to avoid these if at all possible (I cannot for allocations
since I don't want to keep tracking changes in how R implements its GC
and allocation).

However, if SEXP type behaviour of the standard types, and how
attributes are handled are not highly mobile, I think that what I'm
doing will be OK - at worst the Go code will panic and result in an R
error. The necessary interface to R for allocations is only eight
functions[1].

Note that there is a lot in WRE that's beyond what I want rgo to be
able to do (calling in to R from Go for example). In fact, there's just
a lot in WRE (it's almost 3 times the length of the Go language spec
and memory model reference combined). The issues around weak references
and external pointers are not something that I want to deal with;
working with that kind of object is not idiomatic for Go (in fact
without using C.malloc, R external pointers from Go would be forbidden
by the Go runtime) and I would not expect that they are likely to be
used by people writing extensions for R in Go.

Dan

[1]


https://github.com/rgonomic/rgo/blob/2ce7717c85516bbfb94d0b5c7ef1d9749dd1f817/sexp/r_internal.go#L86-L118

On Tue, 2020-09-08 at 11:07 +0200, Tomas Kalibera wrote:> The general principle is that R packages are only allowed to use what
> is
> documented in the R help (? command) and in Writing R Extensions. The
> former covers what is allowed from R code in extensions, the latter
> mostly what is allowed from C code in extensions (with some
> references
> to Fortran).
> 
> If you are implementing a Go interface for writing R packages, such
> Go
> interface should thus only use what is in the R help and in Writing R
> Extensions. Otherwise, packages would not be able to use such
> interface.
> 
> What is described in R Internals is for understanding the internal
> structure of R implementation itself, so for development of R itself,
> it
> could help indeed also debugging of R itself and in some cases
> debugging
> or performance analysis of extensions. R Internals can help in giving
> an
> intuition, but when people are implementing R itself, they also need
> to
> check the code. R Internals does not describe any interface for
> external
> code, if it states any constraints about say pairlists, etc, take it
> as
> an intuition for what has been intended and probably holds or held at
> some level of abstraction, but you need to check the source code for
> the
> details, anyway (e.g., at some very low level CAR and CDR can be any
> SEXP or R_NilValue, locally in some functions even C NULL).
> Internally,
> some C code uses C NULL SEXPs, but it is rare and local, and again,
> only
> the interface described in Writing R Extensions is for external use.
> 
> WRE speaks about "R NULL", "R NULL object" or "C
NULL" in some cases
> to
> avoid confusion, e.g. for values types as "void *". SEXPs that
> packages
> obtain using the interface in WRE should not be C NULL, only R NULL
> (R_NilValue). External pointers can become C NULL and this is
> documented
> in WRE 5.13.
> 
> Best
> Tomas
> 
> On 9/6/20 3:44 AM, Dan Kortschak via R-devel wrote:
> > Hello,
> > 
> > I am writing an R/Go interoperability tool[1] that work similarly
> > to
> > Rcpp; the tool takes packages written in Go and performs the
> > necessary
> > Go type analysis to wrap the Go code with C and R shims that allow
> > the
> > Go code to then be called from R. The system is largely complete
> > (with
> > the exception of having a clean approach to handling generalised
> > attributes in the easy case[2] - the less hand holding case does
> > handle
> > these). Testing of some of the code is unfortunately lacking
> > because of
> > the difficulties of testing across environments.
> > 
> > To make the system flexible I have provided an (intentionally
> > incomplete) Go API into the R internals which allows reasonably Go
> > type-safe interaction with SEXP values (Go does not have unions, so
> > this is uglier than it might be otherwise and unions are faked with
> > Go
> > interface values). For efficiency reasons I've avoided using R
> > internal
> > calls where possible (accessors are done with Go code directly, but
> > allocations are done in R's C code to avoid having to duplicate
the
> > garbage collection mechanics in Go with the obvious risks of error
> > and
> > possible behaviour skew in the future).
> > 
> > In doing this work I have some questions that I have not been able
> > to
> > find answers for in the R-ints doc or hadley/r-internals.
> > 
> >     1. In R-ints, the LISTSXP SEXP type CDR is said to hold
> > "usually"
> >        LISTSXP or NULL. What does the "usually" mean here?
Is it
> > possible
> >        for the CDR to hold values other than LISTSXP or NULL, and
> > is
> >        this NULL NILSXP or C NULL? I assume that the CAR can hold
> > any type
> >        of SEXP, is this correct?
> >     2. The LANGSXP and DOTSXP types are lists, but the R-ints
> > comments on
> >        them do not say whether the CDR of one of these lists is the
> > same at
> >        the head of the list of devolves to a LISTSXP. Looking
> > through the
> >        code suggests to me that functions that allocate these two
> > types
> >        allocate a LISTSXP and then change only the head of the list
> > to be
> >        the LANGSXP or DOTSXP that's required, meaning that the
tail
> > of the
> >        list is all LISTSXP. Is this correct?
> > 
> > The last question is more a question of interest in design
> > strategy,
> > and the answer may have been lost to time. In order to reduce the
> > need
> > to go through Go's interface assertions in a number of cases I
have
> > decided to reinterpret R_NilValue to an untyped Go nil (this is
> > important for example in list traversal where the CDR can
> > (hopefully)
> > be only one of two types LISTSXP or NILSXP; in Go this would
> > require a
> > generalised SEXP return, but by doing this reinterpretation I can
> > return a *List pointer which may be nil, greatly simplifying the
> > code
> > and improving the performance). My question her is why a singleton
> > null
> > value was chosen to be represented as a fully allocated SEXP value
> > rather than just a C NULL. Also, whether C NULL is used to any
> > great
> > extent within the internal code. Note that the Go API provides a
> > mechanism to easily reconvert the nil's used back to a R_NilValue
> > when
> > returning from a Go function[3].
> > 
> > thanks
> > Dan Kortschak
> > 
> > [1]https://github.com/rgonomic/rgo
> > [2]https://github.com/rgonomic/rgo/issues/1
> > [3]
> > 
https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export> > 
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>

Tomas Kalibera

2020-Sep-08 10:08 UTC

head link

[Rd] some questions about R internal SEXP types

On 9/8/20 11:47 AM, Dan Kortschak wrote:> Thanks, Tomas.
>
> This is unfortunate. Calling between Go and C is not cheap; the gc
> implementation of the Go compiler (as opposed to gccgo) uses different
> calling conventions from C and there are checks to ensure that Go
> allocated memory pointers do not leak into C code. For this reason I
> wanted to avoid these if at all possible (I cannot for allocations
> since I don't want to keep tracking changes in how R implements its GC
> and allocation).
>
> However, if SEXP type behaviour of the standard types, and how
> attributes are handled are not highly mobile, I think that what I'm
> doing will be OK - at worst the Go code will panic and result in an R
> error. The necessary interface to R for allocations is only eight
> functions[1].
I am not sure if I understand correctly, but if you were accessing 
directly the memory of SEXPs from Go implementation instead of calling 
through exported access functions documented in WRE, that would be a 
really bad idea. Of course fine for research and experimentation, but 
the internal structure can and does change at any time, otherwise we 
would not be able to develop nor maintain R. Such direct access 
bypassing WRE would likely be a clear case for rejection in CRAN for 
this interface and any packages using it, and I hope in other package 
repositories as well.

However, I believe the overhead of calling the C-level access functions 
R exports should be minimal compared to other overheads. You can't hope, 
anyway, for being able to efficiently call tiny functions frequently 
between Go and R. This can only work for bigger functions, anyway, and 
then the Go-C overhead should not be important.
> Note that there is a lot in WRE that's beyond what I want rgo to be
> able to do (calling in to R from Go for example). In fact, there's just
> a lot in WRE (it's almost 3 times the length of the Go language spec
> and memory model reference combined). The issues around weak references
> and external pointers are not something that I want to deal with;
> working with that kind of object is not idiomatic for Go (in fact
> without using C.malloc, R external pointers from Go would be forbidden
> by the Go runtime) and I would not expect that they are likely to be
> used by people writing extensions for R in Go.
Sure, I think it is perfectly fine to cover only a subset, if that is 
already useful to write some extensions in Go. Maintenance would be 
easiest if Go programs didn't call back into the R runtime at all, so 
fewer calls the better for maintenance.

Best
Tomas
>
> Dan
>
> [1]
>
>
>
https://github.com/rgonomic/rgo/blob/2ce7717c85516bbfb94d0b5c7ef1d9749dd1f817/sexp/r_internal.go#L86-L118
>
> On Tue, 2020-09-08 at 11:07 +0200, Tomas Kalibera wrote:
>> The general principle is that R packages are only allowed to use what
>> is
>> documented in the R help (? command) and in Writing R Extensions. The
>> former covers what is allowed from R code in extensions, the latter
>> mostly what is allowed from C code in extensions (with some
>> references
>> to Fortran).
>>
>> If you are implementing a Go interface for writing R packages, such
>> Go
>> interface should thus only use what is in the R help and in Writing R
>> Extensions. Otherwise, packages would not be able to use such
>> interface.
>>
>> What is described in R Internals is for understanding the internal
>> structure of R implementation itself, so for development of R itself,
>> it
>> could help indeed also debugging of R itself and in some cases
>> debugging
>> or performance analysis of extensions. R Internals can help in giving
>> an
>> intuition, but when people are implementing R itself, they also need
>> to
>> check the code. R Internals does not describe any interface for
>> external
>> code, if it states any constraints about say pairlists, etc, take it
>> as
>> an intuition for what has been intended and probably holds or held at
>> some level of abstraction, but you need to check the source code for
>> the
>> details, anyway (e.g., at some very low level CAR and CDR can be any
>> SEXP or R_NilValue, locally in some functions even C NULL).
>> Internally,
>> some C code uses C NULL SEXPs, but it is rare and local, and again,
>> only
>> the interface described in Writing R Extensions is for external use.
>>
>> WRE speaks about "R NULL", "R NULL object" or
"C NULL" in some cases
>> to
>> avoid confusion, e.g. for values types as "void *". SEXPs
that
>> packages
>> obtain using the interface in WRE should not be C NULL, only R NULL
>> (R_NilValue). External pointers can become C NULL and this is
>> documented
>> in WRE 5.13.
>>
>> Best
>> Tomas
>>
>> On 9/6/20 3:44 AM, Dan Kortschak via R-devel wrote:
>>> Hello,
>>>
>>> I am writing an R/Go interoperability tool[1] that work similarly
>>> to
>>> Rcpp; the tool takes packages written in Go and performs the
>>> necessary
>>> Go type analysis to wrap the Go code with C and R shims that allow
>>> the
>>> Go code to then be called from R. The system is largely complete
>>> (with
>>> the exception of having a clean approach to handling generalised
>>> attributes in the easy case[2] - the less hand holding case does
>>> handle
>>> these). Testing of some of the code is unfortunately lacking
>>> because of
>>> the difficulties of testing across environments.
>>>
>>> To make the system flexible I have provided an (intentionally
>>> incomplete) Go API into the R internals which allows reasonably Go
>>> type-safe interaction with SEXP values (Go does not have unions, so
>>> this is uglier than it might be otherwise and unions are faked with
>>> Go
>>> interface values). For efficiency reasons I've avoided using R
>>> internal
>>> calls where possible (accessors are done with Go code directly, but
>>> allocations are done in R's C code to avoid having to duplicate
the
>>> garbage collection mechanics in Go with the obvious risks of error
>>> and
>>> possible behaviour skew in the future).
>>>
>>> In doing this work I have some questions that I have not been able
>>> to
>>> find answers for in the R-ints doc or hadley/r-internals.
>>>
>>>      1. In R-ints, the LISTSXP SEXP type CDR is said to hold
>>> "usually"
>>>         LISTSXP or NULL. What does the "usually" mean
here? Is it
>>> possible
>>>         for the CDR to hold values other than LISTSXP or NULL, and
>>> is
>>>         this NULL NILSXP or C NULL? I assume that the CAR can hold
>>> any type
>>>         of SEXP, is this correct?
>>>      2. The LANGSXP and DOTSXP types are lists, but the R-ints
>>> comments on
>>>         them do not say whether the CDR of one of these lists is
the
>>> same at
>>>         the head of the list of devolves to a LISTSXP. Looking
>>> through the
>>>         code suggests to me that functions that allocate these two
>>> types
>>>         allocate a LISTSXP and then change only the head of the
list
>>> to be
>>>         the LANGSXP or DOTSXP that's required, meaning that the
tail
>>> of the
>>>         list is all LISTSXP. Is this correct?
>>>
>>> The last question is more a question of interest in design
>>> strategy,
>>> and the answer may have been lost to time. In order to reduce the
>>> need
>>> to go through Go's interface assertions in a number of cases I
have
>>> decided to reinterpret R_NilValue to an untyped Go nil (this is
>>> important for example in list traversal where the CDR can
>>> (hopefully)
>>> be only one of two types LISTSXP or NILSXP; in Go this would
>>> require a
>>> generalised SEXP return, but by doing this reinterpretation I can
>>> return a *List pointer which may be nil, greatly simplifying the
>>> code
>>> and improving the performance). My question her is why a singleton
>>> null
>>> value was chosen to be represented as a fully allocated SEXP value
>>> rather than just a C NULL. Also, whether C NULL is used to any
>>> great
>>> extent within the internal code. Note that the Go API provides a
>>> mechanism to easily reconvert the nil's used back to a
R_NilValue
>>> when
>>> returning from a Go function[3].
>>>
>>> thanks
>>> Dan Kortschak
>>>
>>> [1]https://github.com/rgonomic/rgo
>>> [2]https://github.com/rgonomic/rgo/issues/1
>>> [3]
>>>
> https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

Dan Kortschak

2020-Sep-08 11:13 UTC

head link

[Rd] some questions about R internal SEXP types

On Tue, 2020-09-08 at 12:08 +0200, Tomas Kalibera wrote:> I am not sure if I understand correctly, but if you were accessing
> directly the memory of SEXPs from Go implementation instead of
> calling
> through exported access functions documented in WRE, that would be a
> really bad idea. Of course fine for research and experimentation, but
> the internal structure can and does change at any time, otherwise we
> would not be able to develop nor maintain R. Such direct access
> bypassing WRE would likely be a clear case for rejection in CRAN for
> this interface and any packages using it, and I hope in other package
> repositories as well.
Sorry, I'm coming from a language that has strong backwards
compatibility guarantees and (generally) machine level data types, so
it is surprising to me that basic data types are that fluid.
> However, I believe the overhead of calling the C-level access
> functions
> R exports should be minimal compared to other overheads. You can't
> hope,
> anyway, for being able to efficiently call tiny functions frequently
> between Go and R. This can only work for bigger functions, anyway,
> and
> then the Go-C overhead should not be important.
This really depends on the complexity/structure of the data structures
that are being handed in to Go. The entirety of the tool is there to
allow interchange of data between Go and R, in the case of atomic
vectors, this cost is very cheap with direct access or via Cgo calling,
however each name access or attribute access (both of which are
necessary for struct population - and structs may come in slices) is a
Cgo call; these look ups go from ~nanosecond to ~hundred nanoseconds
per lookup.
> > Note that there is a lot in WRE that's beyond what I want rgo to
be
> > able to do (calling in to R from Go for example). In fact, there's
> > just
> > a lot in WRE (it's almost 3 times the length of the Go language
> > spec
> > and memory model reference combined). The issues around weak
> > references
> > and external pointers are not something that I want to deal with;
> > working with that kind of object is not idiomatic for Go (in fact
> > without using C.malloc, R external pointers from Go would be
> > forbidden
> > by the Go runtime) and I would not expect that they are likely to
> > be
> > used by people writing extensions for R in Go.
> 
> Sure, I think it is perfectly fine to cover only a subset, if that is
> already useful to write some extensions in Go. Maintenance would be
> easiest if Go programs didn't call back into the R runtime at all, so
> fewer calls the better for maintenance.
This is apparently unavoidable though from what I read here.
> Best
> Tomas

thanks
Dan

Hadley Wickham

2020-Sep-08 12:25 UTC

head link

[Rd] some questions about R internal SEXP types

On Tue, Sep 8, 2020 at 4:12 AM Tomas Kalibera <tomas.kalibera at
gmail.com> wrote:>
>
> The general principle is that R packages are only allowed to use what is
> documented in the R help (? command) and in Writing R Extensions. The
> former covers what is allowed from R code in extensions, the latter
> mostly what is allowed from C code in extensions (with some references
> to Fortran).
Could you clarify what you mean by "documented"? For example,
Rf_allocVector() is mentioned several times in R-exts, but I don't see
anywhere where the inputs and output are precisely described (which is
what I would consider to be documented). Is Rf_allocVector() part of
the API?

Hadley

-- 
http://hadley.nz

Reasonably Related Threads

Search for more reasonably related threads

R devel - Sep 2020 - some questions about R internal SEXP types

[Rd] some questions about R internal SEXP types

[Rd] some questions about R internal SEXP types

[Rd] some questions about R internal SEXP types

[Rd] some questions about R internal SEXP types

[Rd] some questions about R internal SEXP types

[Rd] some questions about R internal SEXP types

[Rd] some questions about R internal SEXP types

[Rd] some questions about R internal SEXP types

[Rd] some questions about R internal SEXP types

Reasonably Related Threads