Henrik Bengtsson
2018-Aug-24 17:55 UTC
[Rd] True length - length(unclass(x)) - without having to call unclass()?
Is there a low-level function that returns the length of an object 'x' - the length that for instance .subset(x) and .subset2(x) see? An obvious candidate would be to use: .length <- function(x) length(unclass(x)) However, I'm concerned that calling unclass(x) may trigger an expensive copy internally in some cases. Is that concern unfounded? Thxs, Henrik
Dénes Tóth
2018-Sep-01 23:19 UTC
[Rd] True length - length(unclass(x)) - without having to call unclass()?
The solution below introduces a dependency on data.table, but otherwise it does what you need: --- # special method for Foo objects length.Foo <- function(x) { length(unlist(x, recursive = TRUE, use.names = FALSE)) } # an instance of a Foo object x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo") # its length stopifnot(length(x) == 3L) # get its length as if it were a standard list .length <- function(x) { cls <- class(x) # setattr() does not make a copy, but modifies by reference data.table::setattr(x, "class", NULL) # get the length len <- base::length(x) # re-set original classes data.table::setattr(x, "class", cls) # return the unclassed length len } # to check that we do not make unwanted changes orig_class <- class(x) # check that the address in RAM does not change a1 <- data.table::address(x) # 'unclassed' length stopifnot(.length(x) == 2L) # check that address is the same stopifnot(a1 == data.table::address(x)) # check against original class stopifnot(identical(orig_class, class(x))) --- On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:> Is there a low-level function that returns the length of an object 'x' > - the length that for instance .subset(x) and .subset2(x) see? An > obvious candidate would be to use: > > .length <- function(x) length(unclass(x)) > > However, I'm concerned that calling unclass(x) may trigger an > expensive copy internally in some cases. Is that concern unfounded? > > Thxs, > > Henrik > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Hadley Wickham
2018-Sep-02 13:08 UTC
[Rd] True length - length(unclass(x)) - without having to call unclass()?
For the new vctrs::records class, I implemented length, names, [[, and [[<- myself in https://github.com/r-lib/vctrs/blob/master/src/fields.c. That lets me override the default S3 methods while still being able to access the underlying data that I'm interested in. Another option that avoids (that you should never discuss in public ?) is temporarily setting the object bit to FALSE. In the long run, I think an ALTREP vector that exposes the underlying data of an S3 object (i.e. sans attributes apart from names) is probably the way forward. Hadley On Fri, Aug 24, 2018 at 1:03 PM Henrik Bengtsson <henrik.bengtsson at gmail.com> wrote:> > Is there a low-level function that returns the length of an object 'x' > - the length that for instance .subset(x) and .subset2(x) see? An > obvious candidate would be to use: > > .length <- function(x) length(unclass(x)) > > However, I'm concerned that calling unclass(x) may trigger an > expensive copy internally in some cases. Is that concern unfounded? > > Thxs, > > Henrik > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- http://hadley.nz
Tomas Kalibera
2018-Sep-03 09:49 UTC
[Rd] True length - length(unclass(x)) - without having to call unclass()?
Please don't do this to get the underlying vector length (or to achieve anything else). Setting/deleting attributes of an R object without checking the reference count violates R semantics, which in turn can have unpredictable results on R programs (essentially undebuggable segfaults now or more likely later when new optimizations or features are added to the language). Setting attributes on objects with reference count (currently NAMED value) greater than 0 (in some special cases 1 is ok) is cheating - please see Writing R Extensions - and getting speedups via cheating leads to fragile, unmaintainable and buggy code. Doing so in packages is particularly unhelpful to the whole community - packages should only use the public API as documented. Similarly, getting a physical address of an object to hack around whether R has copied it or not should certainly not be done in packages and R code should never be working with or even obtaining physical address of an object. This is also why one cannot obtain such address using base R (apart in textual form from certain diagnostic messages where it can indeed be useful for low-level debugging). Tomas On 09/02/2018 01:19 AM, D?nes T?th wrote:> The solution below introduces a dependency on data.table, but > otherwise it does what you need: > > --- > > # special method for Foo objects > length.Foo <- function(x) { > ? length(unlist(x, recursive = TRUE, use.names = FALSE)) > } > > # an instance of a Foo object > x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class = "Foo") > > # its length > stopifnot(length(x) == 3L) > > # get its length as if it were a standard list > .length <- function(x) { > ? cls <- class(x) > ? # setattr() does not make a copy, but modifies by reference > ? data.table::setattr(x, "class", NULL) > ? # get the length > ? len <- base::length(x) > ? # re-set original classes > ? data.table::setattr(x, "class", cls) > ? # return the unclassed length > ? len > } > > # to check that we do not make unwanted changes > orig_class <- class(x) > > # check that the address in RAM does not change > a1 <- data.table::address(x) > > # 'unclassed' length > stopifnot(.length(x) == 2L) > > # check that address is the same > stopifnot(a1 == data.table::address(x)) > > # check against original class > stopifnot(identical(orig_class, class(x))) > > --- > > > On 08/24/2018 07:55 PM, Henrik Bengtsson wrote: >> Is there a low-level function that returns the length of an object 'x' >> - the length that for instance .subset(x) and .subset2(x) see? An >> obvious candidate would be to use: >> >> .length <- function(x) length(unclass(x)) >> >> However, I'm concerned that calling unclass(x) may trigger an >> expensive copy internally in some cases.? Is that concern unfounded? >> >> Thxs, >> >> Henrik >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Tomas Kalibera
2018-Sep-05 08:09 UTC
[Rd] True length - length(unclass(x)) - without having to call unclass()?
On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:> Is there a low-level function that returns the length of an object 'x' > - the length that for instance .subset(x) and .subset2(x) see? An > obvious candidate would be to use: > > .length <- function(x) length(unclass(x)) > > However, I'm concerned that calling unclass(x) may trigger an > expensive copy internally in some cases. Is that concern unfounded?Unclass() will always copy when "x" is really a variable, because the value in "x" will be referenced; whether it is prohibitively expensive or not depends only on the workload - if "x" is a very long list and this functions is called often then it could, but at least to me this sounds unlikely. Unless you have a strong reason to believe it is the case I would just use length(unclass(x)). If the copying is really a problem, I would think about why the underlying vector length is needed at R level - whether you really need to know the length without actually having the unclassed vector anyway for something else, so whether you are not paying for the copy anyway. Or, from the other end, if you need to do more without copying, and it is possible without breaking the value semantics, then you might need to switch to C anyway and for a bigger piece of code. If it were still just .length() you needed and it were performance critical, you could just switch to C and call Rf_length. That does not violate the semantics, just indeed it is not elegant as you are switching to C. If you stick to R and can live with the overhead of length(unclass(x)) then there is a chance the overhead will decrease as R is optimized internally. This is possible in principle when the runtime knows that the unclassed vector is only needed to compute something that does not modify the vector. The current R cannot optimize this out, but it should be possible with ALTREP at some point (and as Radford mentioned pqR does it differently). Even with such internal optimizations indeed it is often necessary to make guesses about realistic workloads, so if you have a realistic workload where say length(unclass(x)) is critical, you are more than welcome to donate it as benchmark. Obviously, if you use a C version calling Rf_length, after such R optimization your code would be unnecessarily non-elegant, but would still work and probably without overhead, because R can't do much less than Rf_length. In more complicated cases though hand-optimized C code to implement say 2 operations in sequence could be slower than what better optimizing runtime could do by joining the effect of possibly more operations, which is in principle another danger of switching from R to C. But as far as the semantics is followed, there is no other danger. The temptation should be small anyway in this case when Rf_length() would be the simplest, but as I made it more than clear in the previous email, one should never violate the value semantics by temporarily modifying the object (temporarily removing the class attribute or temporarily remove the object bit). Violating semantics causes bugs, if not with the present then with future versions of R (where version may be an svn revision). A concrete recent example: modifying objects in place in violation of the semantics caused a lot of bugs with introduction of unification of constants in the byte-code compiler. Best Tomas> > Thxs, > > Henrik > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Iñaki Ucar
2018-Sep-05 09:18 UTC
[Rd] True length - length(unclass(x)) - without having to call unclass()?
The bottomline here is that one can always call a base method, inexpensively and without modifying the object, in, let's say, *formal* OOP languages. In R, this is not possible in general. It would be possible if there was always a foo.default, but primitives use internal dispatch. I was wondering whether it would be possible to provide a super(x, n) function which simply causes the dispatching system to avoid "n" classes in the hierarchy, so that:> x <- structure(list(), class=c("foo", "bar")) > length(super(x, 0)) # looks for a length.foo > length(super(x, 1)) # looks for a length.bar > length(super(x, 2)) # calls the default > length(super(x, Inf)) # calls the defaultI?aki El mi?., 5 sept. 2018 a las 10:09, Tomas Kalibera (<tomas.kalibera at gmail.com>) escribi?:> > On 08/24/2018 07:55 PM, Henrik Bengtsson wrote: > > Is there a low-level function that returns the length of an object 'x' > > - the length that for instance .subset(x) and .subset2(x) see? An > > obvious candidate would be to use: > > > > .length <- function(x) length(unclass(x)) > > > > However, I'm concerned that calling unclass(x) may trigger an > > expensive copy internally in some cases. Is that concern unfounded? > Unclass() will always copy when "x" is really a variable, because the > value in "x" will be referenced; whether it is prohibitively expensive > or not depends only on the workload - if "x" is a very long list and > this functions is called often then it could, but at least to me this > sounds unlikely. Unless you have a strong reason to believe it is the > case I would just use length(unclass(x)). > > If the copying is really a problem, I would think about why the > underlying vector length is needed at R level - whether you really need > to know the length without actually having the unclassed vector anyway > for something else, so whether you are not paying for the copy anyway. > Or, from the other end, if you need to do more without copying, and it > is possible without breaking the value semantics, then you might need to > switch to C anyway and for a bigger piece of code. > > If it were still just .length() you needed and it were performance > critical, you could just switch to C and call Rf_length. That does not > violate the semantics, just indeed it is not elegant as you are > switching to C. > > If you stick to R and can live with the overhead of length(unclass(x)) > then there is a chance the overhead will decrease as R is optimized > internally. This is possible in principle when the runtime knows that > the unclassed vector is only needed to compute something that does not > modify the vector. The current R cannot optimize this out, but it should > be possible with ALTREP at some point (and as Radford mentioned pqR does > it differently). Even with such internal optimizations indeed it is > often necessary to make guesses about realistic workloads, so if you > have a realistic workload where say length(unclass(x)) is critical, you > are more than welcome to donate it as benchmark. > > Obviously, if you use a C version calling Rf_length, after such R > optimization your code would be unnecessarily non-elegant, but would > still work and probably without overhead, because R can't do much less > than Rf_length. In more complicated cases though hand-optimized C code > to implement say 2 operations in sequence could be slower than what > better optimizing runtime could do by joining the effect of possibly > more operations, which is in principle another danger of switching from > R to C. But as far as the semantics is followed, there is no other danger. > > The temptation should be small anyway in this case when Rf_length() > would be the simplest, but as I made it more than clear in the previous > email, one should never violate the value semantics by temporarily > modifying the object (temporarily removing the class attribute or > temporarily remove the object bit). Violating semantics causes bugs, if > not with the present then with future versions of R (where version may > be an svn revision). A concrete recent example: modifying objects in > place in violation of the semantics caused a lot of bugs with > introduction of unification of constants in the byte-code compiler. > > Best > Tomas > > > > > Thxs, > > > > Henrik > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- I?aki Ucar
Possibly Parallel Threads
- True length - length(unclass(x)) - without having to call unclass()?
- True length - length(unclass(x)) - without having to call unclass()?
- True length - length(unclass(x)) - without having to call unclass()?
- True length - length(unclass(x)) - without having to call unclass()?
- True length - length(unclass(x)) - without having to call unclass()?