Wang Jiefei
2019-Sep-23 19:17 UTC
[Rd] What is the best way to loop over an ALTREP vector?
Sorry for post a lot of things, for the first part of code, I copied my C++
iter macro by mistake(and you can see an explicit type casting). Here is
the macro definition from R_exts/Itermacros.h
#define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \
strt, nfull, expr) do { \
* const** etype *px = DATAPTR_OR_NULL(sx); * \
if (px != NULL) { \
R_xlen_t __ibr_n__ = strt + nfull; \
R_xlen_t nb = __ibr_n__; \
for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \
expr \
} \
} \
else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
\
strt, nfull, expr); \
} while (0)
Best,
Jiefei
On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08 at gmail.com> wrote:
> Hi Gabriel,
>
> I have tried the macro and found a small issue, it seems like the macro is
> written in C and does an implicit type conversion(const void * to const int
> *), see below. While it is allowed in C, C++ seems not happy with it. Is it
> possible to add an explicit type casting so that it can be compatible with
> both language?
>
>
> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \
>
> strt, nfull, expr) do { \
>
> *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx); *
> \
>
> if (px != NULL) { \
>
> R_xlen_t __ibr_n__ = strt + nfull; \
>
> R_xlen_t nb = __ibr_n__; \
>
> for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \
>
> expr \
>
> } \
>
> } \
>
> else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
> vtype, \
>
> strt, nfull, expr); \
>
> } while (0)
>
>
> Also, I notice that the element type(etype) and vector type(vtype) has
> to be specified in the macro. Since the SEXP is the first argument in the
> macro, it seems redundant to define etype and vtype for they have to match
> the type of the SEXP. I'm wondering if this is intentional? Will there
be a
> type-free macro in R in the future? Here is a simple type-free macro
I'm
> using.
>
> #define type_free_iter(sx, ptr, ind, nbatch,expr)\
>
> switch(TYPEOF(sx)){\
>
> case INTSXP:\
>
> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
>
> break; \
>
> case REALSXP:\
>
> ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
>
> break; \
>
> case LGLSXP:\
>
> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
>
> break; \
>
> default:\
>
> Rf_error("Unknow data type\n"); \
>
> break; \
>
> }
>
>
>
> // [[Rcpp::export]]
>
> double sillysum(SEXP x) {
>
> double s = 0.0;
>
> type_free_iter(x, ptr, ind, nbatch,
>
> {
>
> for (int i = 0; i < nbatch; i++) { s = s + ptr[i];
}
>
> });
>
> return s;
>
> }
>
>
>
>
> Best,
>
> Jiefei
>
> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08 at gmail.com>
wrote:
>
>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting
>> to see that there are lots of changes in ALTREP in R devel version. I
>> really appreciate your help!
>>
>> Best,
>> Jiefei
>>
>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembecker at
gmail.com>
>> wrote:
>>
>>> Jiefei,
>>>
>>> I've been meaning to write up something about this so hopefully
this
>>> will be an impetus for me to actually do that, but until then,
responses
>>> inline.
>>>
>>>
>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08 at
gmail.com> wrote:
>>>
>>>> Hi devel team,
>>>>
>>>> I'm working on C/C++ level ALTREP compatibility for a
package. The
>>>> package
>>>> previously used pointers to access the data of a SEXP, so it
would not
>>>> work
>>>> for some ALTREP objects which do not have a pointer. I plan to
rewrite
>>>> the
>>>> code and use functions like get_elt, get_region, and get_subset
to
>>>> access
>>>> the values of a vector, so I have a few questions for ALTREP:
>>>>
>>>> 1. Since an ALTREP do not have to define all of the above
>>>> functions(element, region, subset), is there any way to check
which
>>>> function has been defined for an ALTREP class? I did a search
on
>>>> RInternal.h and altrep.c but did not find a solution for it. If
not,
>>>> will
>>>> it be added in the future?
>>>>
>>>
>>> Element and region are guaranteed to always be defined and work
(for
>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently
don't
>>> have region for STRSXP or VECSXP, I believe). If the altrep class
does not
>>> provide them then default methods will be used, which may be
inefficient in
>>> some cases but will work. Subset is currently a forward looking
stub, but
>>> once implimented, that will also be guaranteed to work for all
valid ALTREP
>>> classes.
>>>
>>>
>>>>
>>>> 2. Given the diversity of ALTREP classes, what is the best way
to loop
>>>> over
>>>> an ALTREP object? I hope there can be an all-in-one function
which can
>>>> get
>>>> the values from a vector as long as at least one of the above
functions
>>>> has
>>>> been defined, so package developers would not be bothered by
tons of
>>>> `if-else` statement if they want their package to work with
ALTREP.
>>>> Since
>>>> it seems like there is no such function exist, what could be
the best
>>>> way
>>>> to do the loop under the current R version?
>>>>
>>>
>>> The best way to loop over all SEXPs, which supports both altrep and
>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in
R for a
>>> number of released versions, at least since 3.5.0 I think) and the
much
>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
>>> R_exts/Itermacros.h
>>>
>>> The meaning of the arguments is as follows for
ITERATE_BY_REGION_PARTIAL
>>> are as follows (ITERATE_BY_REGION is the same except no strt, and
nfull).
>>>
>>>
>>> - sx - C level variable name of the SEXP to iterate over
>>> - px - variable name to use for the pointer populated with data
from
>>> a region of sx
>>> - idx - variable name to use for the "outer", batch
counter in the
>>> for loop. This will contain the 0-indexed start position of the
batch
>>> you're currently processing
>>> - nb - variable name to use for the current batch size. This
will
>>> always either be GET_REGION_BUFFSIZE (512), or the number of
elements
>>> remaining in the vector, whichever is smaller
>>> - etype - element (C) type, e.g., int, double, of the data
>>> - vtype - vector (access API) type, e.g, INTEGER, REAL
>>> - strt - the 0-indexed position in the vector to start iterating
>>> - nfull - the total number oif elements to iterate over from the
>>> vector
>>> - expr - the code to process a single batch (Which will do
things to
>>> px, typically)
>>>
>>>
>>> So code to perform badly implemented not good idea summing of
REALSXP
>>> data might look like
>>>
>>> double sillysum(SEXP x) {
>>>
>>> double s = 0.0;
>>>
>>> ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
>>> {
>>>
>>> for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
>>> })
>>>
>>> return s;
>>> }
>>>
>>> For meatier examples of ITERATE_BY_REGION's use in practice you
can grep
>>> the R sources. I know it is used in the implementations of the
various
>>> C-level summaries (summary.c), print and formatting functions, and
anyNA.
>>>
>>> Some things to remember
>>>
>>> - If you have an inner loop like the one above, your total
position
>>> in the original vector is ind + i
>>> - ITERATE_BY_REGION always processes the whole vector, if you
need
>>> to only do part of it yo'll either need custom breaking for
both inner and
>>> outer loopsl, or in R-devel you can use
ITERATE_BY_REGION_PARTIAL
>>> - Don't use the variants ending in 0, all they do is skip
over
>>> things that are a good idea in the case of non-altreps (and some
very
>>> specific altreps).
>>>
>>> Hope that helps.
>>>
>>> Best,
>>> ~G
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Best,
>>>> Jiefei
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
[[alternative HTML version deleted]]
Not sure if you're using just C++ or Rcpp for C++ access but https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on using C++ w/ALTREP.> On Sep 23, 2019, at 3:17 PM, Wang Jiefei <szwjf08 at gmail.com> wrote: > > Sorry for post a lot of things, for the first part of code, I copied my C++ > iter macro by mistake(and you can see an explicit type casting). Here is > the macro definition from R_exts/Itermacros.h > > #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \ > > strt, nfull, expr) do { \ > > * const** etype *px = DATAPTR_OR_NULL(sx); * \ > > if (px != NULL) { \ > > R_xlen_t __ibr_n__ = strt + nfull; \ > > R_xlen_t nb = __ibr_n__; \ > > for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \ > > expr \ > > } \ > > } \ > > else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype, > \ > > strt, nfull, expr); \ > > } while (0) > > > Best, > > Jiefei > > On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08 at gmail.com> wrote: > >> Hi Gabriel, >> >> I have tried the macro and found a small issue, it seems like the macro is >> written in C and does an implicit type conversion(const void * to const int >> *), see below. While it is allowed in C, C++ seems not happy with it. Is it >> possible to add an explicit type casting so that it can be compatible with >> both language? >> >> >> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \ >> >> strt, nfull, expr) do { \ >> >> *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx); * >> \ >> >> if (px != NULL) { \ >> >> R_xlen_t __ibr_n__ = strt + nfull; \ >> >> R_xlen_t nb = __ibr_n__; \ >> >> for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \ >> >> expr \ >> >> } \ >> >> } \ >> >> else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, >> vtype, \ >> >> strt, nfull, expr); \ >> >> } while (0) >> >> >> Also, I notice that the element type(etype) and vector type(vtype) has >> to be specified in the macro. Since the SEXP is the first argument in the >> macro, it seems redundant to define etype and vtype for they have to match >> the type of the SEXP. I'm wondering if this is intentional? Will there be a >> type-free macro in R in the future? Here is a simple type-free macro I'm >> using. >> >> #define type_free_iter(sx, ptr, ind, nbatch,expr)\ >> >> switch(TYPEOF(sx)){\ >> >> case INTSXP:\ >> >> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\ >> >> break; \ >> >> case REALSXP:\ >> >> ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\ >> >> break; \ >> >> case LGLSXP:\ >> >> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\ >> >> break; \ >> >> default:\ >> >> Rf_error("Unknow data type\n"); \ >> >> break; \ >> >> } >> >> >> >> // [[Rcpp::export]] >> >> double sillysum(SEXP x) { >> >> double s = 0.0; >> >> type_free_iter(x, ptr, ind, nbatch, >> >> { >> >> for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; } >> >> }); >> >> return s; >> >> } >> >> >> >> >> Best, >> >> Jiefei >> >> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08 at gmail.com> wrote: >> >>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting >>> to see that there are lots of changes in ALTREP in R devel version. I >>> really appreciate your help! >>> >>> Best, >>> Jiefei >>> >>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembecker at gmail.com> >>> wrote: >>> >>>> Jiefei, >>>> >>>> I've been meaning to write up something about this so hopefully this >>>> will be an impetus for me to actually do that, but until then, responses >>>> inline. >>>> >>>> >>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08 at gmail.com> wrote: >>>> >>>>> Hi devel team, >>>>> >>>>> I'm working on C/C++ level ALTREP compatibility for a package. The >>>>> package >>>>> previously used pointers to access the data of a SEXP, so it would not >>>>> work >>>>> for some ALTREP objects which do not have a pointer. I plan to rewrite >>>>> the >>>>> code and use functions like get_elt, get_region, and get_subset to >>>>> access >>>>> the values of a vector, so I have a few questions for ALTREP: >>>>> >>>>> 1. Since an ALTREP do not have to define all of the above >>>>> functions(element, region, subset), is there any way to check which >>>>> function has been defined for an ALTREP class? I did a search on >>>>> RInternal.h and altrep.c but did not find a solution for it. If not, >>>>> will >>>>> it be added in the future? >>>>> >>>> >>>> Element and region are guaranteed to always be defined and work (for >>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently don't >>>> have region for STRSXP or VECSXP, I believe). If the altrep class does not >>>> provide them then default methods will be used, which may be inefficient in >>>> some cases but will work. Subset is currently a forward looking stub, but >>>> once implimented, that will also be guaranteed to work for all valid ALTREP >>>> classes. >>>> >>>> >>>>> >>>>> 2. Given the diversity of ALTREP classes, what is the best way to loop >>>>> over >>>>> an ALTREP object? I hope there can be an all-in-one function which can >>>>> get >>>>> the values from a vector as long as at least one of the above functions >>>>> has >>>>> been defined, so package developers would not be bothered by tons of >>>>> `if-else` statement if they want their package to work with ALTREP. >>>>> Since >>>>> it seems like there is no such function exist, what could be the best >>>>> way >>>>> to do the loop under the current R version? >>>>> >>>> >>>> The best way to loop over all SEXPs, which supports both altrep and >>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R for a >>>> number of released versions, at least since 3.5.0 I think) and the much >>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in >>>> R_exts/Itermacros.h >>>> >>>> The meaning of the arguments is as follows for ITERATE_BY_REGION_PARTIAL >>>> are as follows (ITERATE_BY_REGION is the same except no strt, and nfull). >>>> >>>> >>>> - sx - C level variable name of the SEXP to iterate over >>>> - px - variable name to use for the pointer populated with data from >>>> a region of sx >>>> - idx - variable name to use for the "outer", batch counter in the >>>> for loop. This will contain the 0-indexed start position of the batch >>>> you're currently processing >>>> - nb - variable name to use for the current batch size. This will >>>> always either be GET_REGION_BUFFSIZE (512), or the number of elements >>>> remaining in the vector, whichever is smaller >>>> - etype - element (C) type, e.g., int, double, of the data >>>> - vtype - vector (access API) type, e.g, INTEGER, REAL >>>> - strt - the 0-indexed position in the vector to start iterating >>>> - nfull - the total number oif elements to iterate over from the >>>> vector >>>> - expr - the code to process a single batch (Which will do things to >>>> px, typically) >>>> >>>> >>>> So code to perform badly implemented not good idea summing of REALSXP >>>> data might look like >>>> >>>> double sillysum(SEXP x) { >>>> >>>> double s = 0.0; >>>> >>>> ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL, >>>> { >>>> >>>> for(int i = 0; i < nbatch; i++) { s = s + ptr[i];} >>>> }) >>>> >>>> return s; >>>> } >>>> >>>> For meatier examples of ITERATE_BY_REGION's use in practice you can grep >>>> the R sources. I know it is used in the implementations of the various >>>> C-level summaries (summary.c), print and formatting functions, and anyNA. >>>> >>>> Some things to remember >>>> >>>> - If you have an inner loop like the one above, your total position >>>> in the original vector is ind + i >>>> - ITERATE_BY_REGION always processes the whole vector, if you need >>>> to only do part of it yo'll either need custom breaking for both inner and >>>> outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL >>>> - Don't use the variants ending in 0, all they do is skip over >>>> things that are a good idea in the case of non-altreps (and some very >>>> specific altreps). >>>> >>>> Hope that helps. >>>> >>>> Best, >>>> ~G >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> Best, >>>>> Jiefei >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-devel at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>> >>>> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Gabriel Becker
2019-Sep-24 05:48 UTC
[Rd] What is the best way to loop over an ALTREP vector?
Hi Bob, Thanks for sending around the link to that. It looks mostly right and looks like a useful onramp. There are a few things to watch out for though (I've cc'ed Romain so he's aware of these comments). @romain I hope you taake the following comments as they are intended, as help rather than attacks. The largest issue I see is that the contract for Get_region is that it *populates the provided buffer with a copy of the data. *That buffer is expected to be safe to destructively modify, shuffle, etc though I don't know if we are actually doing that anywhere. As such, if I understand his C++ correctly, that Get_region method is not safe and shouldn't be used. The other point is that Dataptr_or_null is not actually *guaranteed *not to allocate. The default method returns NULL, but we have no way of preventing an allocation in a user-defined method, and probably (?) no easy way of detecting that it is occurring before it causes a bug. That said, Romain is correct that when you are writing Dataptr_or_null methods you should write them so that they don't allocate, generally. Basically your methods for Dataptr_or_null shouldn't allocate, but you also should not write code that relies on hard assumptions that no one's ever will. Also, a small nitpick, R's internal mean function doesn't hit Dataptr, it hits either INTEGER_ELT (which really should probably be a ITERATE_BY_REGION) or ITERATE_BY_REGION. Anyway, I hope that helps. ~G On Mon, Sep 23, 2019 at 6:12 PM Bob Rudis <bob at rud.is> wrote:> Not sure if you're using just C++ or Rcpp for C++ access but > https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on > using C++ w/ALTREP. > > > On Sep 23, 2019, at 3:17 PM, Wang Jiefei <szwjf08 at gmail.com> wrote: > > > > Sorry for post a lot of things, for the first part of code, I copied my > C++ > > iter macro by mistake(and you can see an explicit type casting). Here is > > the macro definition from R_exts/Itermacros.h > > > > #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \ > > > > strt, nfull, expr) do { \ > > > > * const** etype *px = DATAPTR_OR_NULL(sx); * > \ > > > > if (px != NULL) { \ > > > > R_xlen_t __ibr_n__ = strt + nfull; \ > > > > R_xlen_t nb = __ibr_n__; \ > > > > for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \ > > > > expr \ > > > > } \ > > > > } \ > > > > else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype, > > \ > > > > strt, nfull, expr); \ > > > > } while (0) > > > > > > Best, > > > > Jiefei > > > > On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08 at gmail.com> wrote: > > > >> Hi Gabriel, > >> > >> I have tried the macro and found a small issue, it seems like the macro > is > >> written in C and does an implicit type conversion(const void * to const > int > >> *), see below. While it is allowed in C, C++ seems not happy with it. > Is it > >> possible to add an explicit type casting so that it can be compatible > with > >> both language? > >> > >> > >> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \ > >> > >> strt, nfull, expr) do { \ > >> > >> *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx); * > >> \ > >> > >> if (px != NULL) { \ > >> > >> R_xlen_t __ibr_n__ = strt + nfull; \ > >> > >> R_xlen_t nb = __ibr_n__; \ > >> > >> for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \ > >> > >> expr \ > >> > >> } \ > >> > >> } \ > >> > >> else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, > >> vtype, \ > >> > >> strt, nfull, expr); \ > >> > >> } while (0) > >> > >> > >> Also, I notice that the element type(etype) and vector type(vtype) has > >> to be specified in the macro. Since the SEXP is the first argument in > the > >> macro, it seems redundant to define etype and vtype for they have to > match > >> the type of the SEXP. I'm wondering if this is intentional? Will there > be a > >> type-free macro in R in the future? Here is a simple type-free macro I'm > >> using. > >> > >> #define type_free_iter(sx, ptr, ind, nbatch,expr)\ > >> > >> switch(TYPEOF(sx)){\ > >> > >> case INTSXP:\ > >> > >> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\ > >> > >> break; \ > >> > >> case REALSXP:\ > >> > >> ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\ > >> > >> break; \ > >> > >> case LGLSXP:\ > >> > >> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\ > >> > >> break; \ > >> > >> default:\ > >> > >> Rf_error("Unknow data type\n"); \ > >> > >> break; \ > >> > >> } > >> > >> > >> > >> // [[Rcpp::export]] > >> > >> double sillysum(SEXP x) { > >> > >> double s = 0.0; > >> > >> type_free_iter(x, ptr, ind, nbatch, > >> > >> { > >> > >> for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; } > >> > >> }); > >> > >> return s; > >> > >> } > >> > >> > >> > >> > >> Best, > >> > >> Jiefei > >> > >> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08 at gmail.com> wrote: > >> > >>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting > >>> to see that there are lots of changes in ALTREP in R devel version. I > >>> really appreciate your help! > >>> > >>> Best, > >>> Jiefei > >>> > >>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembecker at gmail.com> > >>> wrote: > >>> > >>>> Jiefei, > >>>> > >>>> I've been meaning to write up something about this so hopefully this > >>>> will be an impetus for me to actually do that, but until then, > responses > >>>> inline. > >>>> > >>>> > >>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08 at gmail.com> wrote: > >>>> > >>>>> Hi devel team, > >>>>> > >>>>> I'm working on C/C++ level ALTREP compatibility for a package. The > >>>>> package > >>>>> previously used pointers to access the data of a SEXP, so it would > not > >>>>> work > >>>>> for some ALTREP objects which do not have a pointer. I plan to > rewrite > >>>>> the > >>>>> code and use functions like get_elt, get_region, and get_subset to > >>>>> access > >>>>> the values of a vector, so I have a few questions for ALTREP: > >>>>> > >>>>> 1. Since an ALTREP do not have to define all of the above > >>>>> functions(element, region, subset), is there any way to check which > >>>>> function has been defined for an ALTREP class? I did a search on > >>>>> RInternal.h and altrep.c but did not find a solution for it. If not, > >>>>> will > >>>>> it be added in the future? > >>>>> > >>>> > >>>> Element and region are guaranteed to always be defined and work (for > >>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently > don't > >>>> have region for STRSXP or VECSXP, I believe). If the altrep class > does not > >>>> provide them then default methods will be used, which may be > inefficient in > >>>> some cases but will work. Subset is currently a forward looking stub, > but > >>>> once implimented, that will also be guaranteed to work for all valid > ALTREP > >>>> classes. > >>>> > >>>> > >>>>> > >>>>> 2. Given the diversity of ALTREP classes, what is the best way to > loop > >>>>> over > >>>>> an ALTREP object? I hope there can be an all-in-one function which > can > >>>>> get > >>>>> the values from a vector as long as at least one of the above > functions > >>>>> has > >>>>> been defined, so package developers would not be bothered by tons of > >>>>> `if-else` statement if they want their package to work with ALTREP. > >>>>> Since > >>>>> it seems like there is no such function exist, what could be the best > >>>>> way > >>>>> to do the loop under the current R version? > >>>>> > >>>> > >>>> The best way to loop over all SEXPs, which supports both altrep and > >>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R > for a > >>>> number of released versions, at least since 3.5.0 I think) and the > much > >>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in > >>>> R_exts/Itermacros.h > >>>> > >>>> The meaning of the arguments is as follows for > ITERATE_BY_REGION_PARTIAL > >>>> are as follows (ITERATE_BY_REGION is the same except no strt, and > nfull). > >>>> > >>>> > >>>> - sx - C level variable name of the SEXP to iterate over > >>>> - px - variable name to use for the pointer populated with data from > >>>> a region of sx > >>>> - idx - variable name to use for the "outer", batch counter in the > >>>> for loop. This will contain the 0-indexed start position of the > batch > >>>> you're currently processing > >>>> - nb - variable name to use for the current batch size. This will > >>>> always either be GET_REGION_BUFFSIZE (512), or the number of > elements > >>>> remaining in the vector, whichever is smaller > >>>> - etype - element (C) type, e.g., int, double, of the data > >>>> - vtype - vector (access API) type, e.g, INTEGER, REAL > >>>> - strt - the 0-indexed position in the vector to start iterating > >>>> - nfull - the total number oif elements to iterate over from the > >>>> vector > >>>> - expr - the code to process a single batch (Which will do things to > >>>> px, typically) > >>>> > >>>> > >>>> So code to perform badly implemented not good idea summing of REALSXP > >>>> data might look like > >>>> > >>>> double sillysum(SEXP x) { > >>>> > >>>> double s = 0.0; > >>>> > >>>> ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL, > >>>> { > >>>> > >>>> for(int i = 0; i < nbatch; i++) { s = s + ptr[i];} > >>>> }) > >>>> > >>>> return s; > >>>> } > >>>> > >>>> For meatier examples of ITERATE_BY_REGION's use in practice you can > grep > >>>> the R sources. I know it is used in the implementations of the various > >>>> C-level summaries (summary.c), print and formatting functions, and > anyNA. > >>>> > >>>> Some things to remember > >>>> > >>>> - If you have an inner loop like the one above, your total position > >>>> in the original vector is ind + i > >>>> - ITERATE_BY_REGION always processes the whole vector, if you need > >>>> to only do part of it yo'll either need custom breaking for both > inner and > >>>> outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL > >>>> - Don't use the variants ending in 0, all they do is skip over > >>>> things that are a good idea in the case of non-altreps (and some > very > >>>> specific altreps). > >>>> > >>>> Hope that helps. > >>>> > >>>> Best, > >>>> ~G > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> Best, > >>>>> Jiefei > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> ______________________________________________ > >>>>> R-devel at r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>>> > >>>> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > >[[alternative HTML version deleted]]
Reasonably Related Threads
- What is the best way to loop over an ALTREP vector?
- What is the best way to loop over an ALTREP vector?
- What is the best way to loop over an ALTREP vector?
- Unexpected behavior when using macro to loop over vector
- Unexpected behavior when using macro to loop over vector