Tomas Kalibera
2018-Sep-03 09:49 UTC
[Rd] True length - length(unclass(x)) - without having to call unclass()?
Please don't do this to get the underlying vector length (or to achieve anything else). Setting/deleting attributes of an R object without checking the reference count violates R semantics, which in turn can have unpredictable results on R programs (essentially undebuggable segfaults now or more likely later when new optimizations or features are added to the language). Setting attributes on objects with reference count (currently NAMED value) greater than 0 (in some special cases 1 is ok) is cheating - please see Writing R Extensions - and getting speedups via cheating leads to fragile, unmaintainable and buggy code. Doing so in packages is particularly unhelpful to the whole community - packages should only use the public API as documented. Similarly, getting a physical address of an object to hack around whether R has copied it or not should certainly not be done in packages and R code should never be working with or even obtaining physical address of an object. This is also why one cannot obtain such address using base R (apart in textual form from certain diagnostic messages where it can indeed be useful for low-level debugging). Tomas On 09/02/2018 01:19 AM, D?nes T?th wrote:> The solution below introduces a dependency on data.table, but > otherwise it does what you need: > > --- > > # special method for Foo objects > length.Foo <- function(x) { > ? length(unlist(x, recursive = TRUE, use.names = FALSE)) > } > > # an instance of a Foo object > x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo") > > # its length > stopifnot(length(x) == 3L) > > # get its length as if it were a standard list > .length <- function(x) { > ? cls <- class(x) > ? # setattr() does not make a copy, but modifies by reference > ? data.table::setattr(x, "class", NULL) > ? # get the length > ? len <- base::length(x) > ? # re-set original classes > ? data.table::setattr(x, "class", cls) > ? # return the unclassed length > ? len > } > > # to check that we do not make unwanted changes > orig_class <- class(x) > > # check that the address in RAM does not change > a1 <- data.table::address(x) > > # 'unclassed' length > stopifnot(.length(x) == 2L) > > # check that address is the same > stopifnot(a1 == data.table::address(x)) > > # check against original class > stopifnot(identical(orig_class, class(x))) > > --- > > > On 08/24/2018 07:55 PM, Henrik Bengtsson wrote: >> Is there a low-level function that returns the length of an object 'x' >> - the length that for instance .subset(x) and .subset2(x) see? An >> obvious candidate would be to use: >> >> .length <- function(x) length(unclass(x)) >> >> However, I'm concerned that calling unclass(x) may trigger an >> expensive copy internally in some cases.? Is that concern unfounded? >> >> Thxs, >> >> Henrik >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Dénes Tóth
2018-Sep-03 13:59 UTC
[Rd] True length - length(unclass(x)) - without having to call unclass()?
Hi Tomas, On 09/03/2018 11:49 AM, Tomas Kalibera wrote:> Please don't do this to get the underlying vector length (or to achieve > anything else). Setting/deleting attributes of an R object without > checking the reference count violates R semantics, which in turn can > have unpredictable results on R programs (essentially undebuggable > segfaults now or more likely later when new optimizations or features > are added to the language). Setting attributes on objects with reference > count (currently NAMED value) greater than 0 (in some special cases 1 is > ok) is cheating - please see Writing R Extensions - and getting speedups > via cheating leads to fragile, unmaintainable and buggy code.Please note that data.table::setattr is an exported function of a widely used package (available from CRAN), which also has a description in ?data.table::setattr why it might be useful. Of course one has to use set* functions from data.table with extreme care, but if one does it in the right way, they can help a lot. For example there is no real danger of using them in internal functions where one can control what is get passed to the function or created within the function (so when one knows that the refcount==0 condition is true). (Notwithstanding the above, but also supporting you argumentation, it took me hours to debug a particular problem in one of my internal packages, see https://github.com/Rdatatable/data.table/issues/1281) In the present case, an important and unanswered question is (cited from Henrik): >>> However, I'm concerned that calling unclass(x) may trigger an >>> expensive copy internally in some cases. Is that concern unfounded? If no copy is made, length(unclass(x)) beats length(setattr(..)) in all scenarios.> Doing so > in packages is particularly unhelpful to the whole community - packages > should only use the public API as documented. > > Similarly, getting a physical address of an object to hack around > whether R has copied it or not should certainly not be done in packages > and R code should never be working with or even obtaining physical > address of an object. This is also why one cannot obtain such address > using base R (apart in textual form from certain diagnostic messages > where it can indeed be useful for low-level debugging).Getting the physical address of the object was done exclusively for demonstration purposes. I totally agree that is should not be used for the purpose you described and I have never ever done so. Regards, Denes> > Tomas > > On 09/02/2018 01:19 AM, D?nes T?th wrote: >> The solution below introduces a dependency on data.table, but >> otherwise it does what you need: >> >> --- >> >> # special method for Foo objects >> length.Foo <- function(x) { >> length(unlist(x, recursive = TRUE, use.names = FALSE)) >> } >> >> # an instance of a Foo object >> x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo") >> >> # its length >> stopifnot(length(x) == 3L) >> >> # get its length as if it were a standard list >> .length <- function(x) { >> cls <- class(x) >> # setattr() does not make a copy, but modifies by reference >> data.table::setattr(x, "class", NULL) >> # get the length >> len <- base::length(x) >> # re-set original classes >> data.table::setattr(x, "class", cls) >> # return the unclassed length >> len >> } >> >> # to check that we do not make unwanted changes >> orig_class <- class(x) >> >> # check that the address in RAM does not change >> a1 <- data.table::address(x) >> >> # 'unclassed' length >> stopifnot(.length(x) == 2L) >> >> # check that address is the same >> stopifnot(a1 == data.table::address(x)) >> >> # check against original class >> stopifnot(identical(orig_class, class(x))) >> >> --- >> >> >> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote: >>> Is there a low-level function that returns the length of an object 'x' >>> - the length that for instance .subset(x) and .subset2(x) see? An >>> obvious candidate would be to use: >>> >>> .length <- function(x) length(unclass(x)) >>> >>> However, I'm concerned that calling unclass(x) may trigger an >>> expensive copy internally in some cases. Is that concern unfounded? >>> >>> Thxs, >>> >>> Henrik >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > >
Tomas Kalibera
2018-Sep-03 14:49 UTC
[Rd] True length - length(unclass(x)) - without having to call unclass()?
On 09/03/2018 03:59 PM, D?nes T?th wrote:> Hi Tomas, > > On 09/03/2018 11:49 AM, Tomas Kalibera wrote: >> Please don't do this to get the underlying vector length (or to >> achieve anything else). Setting/deleting attributes of an R object >> without checking the reference count violates R semantics, which in >> turn can have unpredictable results on R programs (essentially >> undebuggable segfaults now or more likely later when new >> optimizations or features are added to the language). Setting >> attributes on objects with reference count (currently NAMED value) >> greater than 0 (in some special cases 1 is ok) is cheating - please >> see Writing R Extensions - and getting speedups via cheating leads to >> fragile, unmaintainable and buggy code. >Hi Denes,> Please note that data.table::setattr is an exported function of a > widely used package (available from CRAN), which also has a > description in ?data.table::setattr why it might be useful.indeed, and not your fault, but the function is cheating and that it is in a widely used package, even exported from it, does not make it any safer. The related optimization in base R (shallow copying) mentioned in the documentation of data.table::setattr is on the other hand sound, it does not break the semantics.> Of course one has to use set* functions from data.table with extreme > care, but if one does it in the right way, they can help a lot. For > example there is no real danger of using them in internal functions > where one can control what is get passed to the function or created > within the function (so when one knows that the refcount==0 condition > is true).Extreme care is not enough as the internals can and do change (and with the limits given by documentation, they are likely to change soon wrt to NAMED/reference counting), not mentioning that they are very complicated. The approach of "modify in place because we know the reference count is 0" is particularly error prone and unnecessary. It is unnecessary because there is documented C API for legitimate use in packages to find out whether an object may be referenced/shared (indirectly checks the reference count). If not, it can be modified in place without cheating, and some packages do it. It is error prone because the reference count can change due to many things package developers cannot be expected to know (and again, these things change): in set* functions for example, it will never be 0 (!), these functions with their current API can never be implemented in current R without breaking the semantics. In principle one can do similar things legitimately by wrapping objects in an environment, passing such environment (environments can legitimately be modified in place), checking the contained objects have reference count of 1 (not shared), and if so, modifying them in place. But indeed, as soon as such objects become shared, there is no way out, one has to copy (in the current R). Best Tomas> (Notwithstanding the above, but also supporting you argumentation, it > took me hours to debug a particular problem in one of my internal > packages, see https://github.com/Rdatatable/data.table/issues/1281) > > In the present case, an important and unanswered question is (cited > from Henrik): > >>> However, I'm concerned that calling unclass(x) may trigger an > >>> expensive copy internally in some cases.? Is that concern unfounded? > > If no copy is made, length(unclass(x)) beats length(setattr(..)) in > all scenarios. > > >> Doing so in packages is particularly unhelpful to the whole community >> - packages should only use the public API as documented. >> >> Similarly, getting a physical address of an object to hack around >> whether R has copied it or not should certainly not be done in >> packages and R code should never be working with or even obtaining >> physical address of an object. This is also why one cannot obtain >> such address using base R (apart in textual form from certain >> diagnostic messages where it can indeed be useful for low-level >> debugging). > > Getting the physical address of the object was done exclusively for > demonstration purposes. I totally agree that is should not be used for > the purpose you described and I have never ever done so. > > Regards, > Denes > >> >> Tomas >> >> On 09/02/2018 01:19 AM, D?nes T?th wrote: >>> The solution below introduces a dependency on data.table, but >>> otherwise it does what you need: >>> >>> --- >>> >>> # special method for Foo objects >>> length.Foo <- function(x) { >>> ? length(unlist(x, recursive = TRUE, use.names = FALSE)) >>> } >>> >>> # an instance of a Foo object >>> x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo") >>> >>> # its length >>> stopifnot(length(x) == 3L) >>> >>> # get its length as if it were a standard list >>> .length <- function(x) { >>> ? cls <- class(x) >>> ? # setattr() does not make a copy, but modifies by reference >>> ? data.table::setattr(x, "class", NULL) >>> ? # get the length >>> ? len <- base::length(x) >>> ? # re-set original classes >>> ? data.table::setattr(x, "class", cls) >>> ? # return the unclassed length >>> ? len >>> } >>> >>> # to check that we do not make unwanted changes >>> orig_class <- class(x) >>> >>> # check that the address in RAM does not change >>> a1 <- data.table::address(x) >>> >>> # 'unclassed' length >>> stopifnot(.length(x) == 2L) >>> >>> # check that address is the same >>> stopifnot(a1 == data.table::address(x)) >>> >>> # check against original class >>> stopifnot(identical(orig_class, class(x))) >>> >>> --- >>> >>> >>> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote: >>>> Is there a low-level function that returns the length of an object 'x' >>>> - the length that for instance .subset(x) and .subset2(x) see? An >>>> obvious candidate would be to use: >>>> >>>> .length <- function(x) length(unclass(x)) >>>> >>>> However, I'm concerned that calling unclass(x) may trigger an >>>> expensive copy internally in some cases.? Is that concern unfounded? >>>> >>>> Thxs, >>>> >>>> Henrik >>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >>
Reasonably Related Threads
- True length - length(unclass(x)) - without having to call unclass()?
- True length - length(unclass(x)) - without having to call unclass()?
- True length - length(unclass(x)) - without having to call unclass()?
- True length - length(unclass(x)) - without having to call unclass()?
- True length - length(unclass(x)) - without having to call unclass()?