thr3ads.net - R devel - [Rd] True length - length(unclass(x)) - without having to call unclass()? [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Henrik Bengtsson

2018-Aug-24 17:55 UTC

[Rd] True length - length(unclass(x)) - without having to call unclass()?

Is there a low-level function that returns the length of an object 'x'
- the length that for instance .subset(x) and .subset2(x) see? An
obvious candidate would be to use:

.length <- function(x) length(unclass(x))

However, I'm concerned that calling unclass(x) may trigger an
expensive copy internally in some cases.  Is that concern unfounded?

Thxs,

Henrik

Dénes Tóth

2018-Sep-01 23:19 UTC

head link

[Rd] True length - length(unclass(x)) - without having to call unclass()?

The solution below introduces a dependency on data.table, but otherwise 
it does what you need:

---

# special method for Foo objects
length.Foo <- function(x) {
   length(unlist(x, recursive = TRUE, use.names = FALSE))
}

# an instance of a Foo object
x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class =
"Foo")

# its length
stopifnot(length(x) == 3L)

# get its length as if it were a standard list
.length <- function(x) {
   cls <- class(x)
   # setattr() does not make a copy, but modifies by reference
   data.table::setattr(x, "class", NULL)
   # get the length
   len <- base::length(x)
   # re-set original classes
   data.table::setattr(x, "class", cls)
   # return the unclassed length
   len
}

# to check that we do not make unwanted changes
orig_class <- class(x)

# check that the address in RAM does not change
a1 <- data.table::address(x)

# 'unclassed' length
stopifnot(.length(x) == 2L)

# check that address is the same
stopifnot(a1 == data.table::address(x))

# check against original class
stopifnot(identical(orig_class, class(x)))

---


On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:> Is there a low-level function that returns the length of an object
'x'
> - the length that for instance .subset(x) and .subset2(x) see? An
> obvious candidate would be to use:
> 
> .length <- function(x) length(unclass(x))
> 
> However, I'm concerned that calling unclass(x) may trigger an
> expensive copy internally in some cases.  Is that concern unfounded?
> 
> Thxs,
> 
> Henrik
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Hadley Wickham

2018-Sep-02 13:08 UTC

head link

[Rd] True length - length(unclass(x)) - without having to call unclass()?

For the new vctrs::records class, I implemented length, names, [[, and
[[<- myself in https://github.com/r-lib/vctrs/blob/master/src/fields.c.
That lets me override the default S3 methods while still being able to
access the underlying data that I'm interested in.

Another option that avoids (that you should never discuss in public
?) is temporarily setting the object bit to FALSE.

In the long run, I think an ALTREP vector that exposes the underlying
data of an S3 object (i.e. sans attributes apart from names) is
probably the way forward.

Hadley
On Fri, Aug 24, 2018 at 1:03 PM Henrik Bengtsson
<henrik.bengtsson at gmail.com> wrote:>
> Is there a low-level function that returns the length of an object
'x'
> - the length that for instance .subset(x) and .subset2(x) see? An
> obvious candidate would be to use:
>
> .length <- function(x) length(unclass(x))
>
> However, I'm concerned that calling unclass(x) may trigger an
> expensive copy internally in some cases.  Is that concern unfounded?
>
> Thxs,
>
> Henrik
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
http://hadley.nz

Tomas Kalibera

2018-Sep-03 09:49 UTC

head link

[Rd] True length - length(unclass(x)) - without having to call unclass()?

Please don't do this to get the underlying vector length (or to achieve 
anything else). Setting/deleting attributes of an R object without 
checking the reference count violates R semantics, which in turn can 
have unpredictable results on R programs (essentially undebuggable 
segfaults now or more likely later when new optimizations or features 
are added to the language). Setting attributes on objects with reference 
count (currently NAMED value) greater than 0 (in some special cases 1 is 
ok) is cheating - please see Writing R Extensions - and getting speedups 
via cheating leads to fragile, unmaintainable and buggy code. Doing so 
in packages is particularly unhelpful to the whole community - packages 
should only use the public API as documented.

Similarly, getting a physical address of an object to hack around 
whether R has copied it or not should certainly not be done in packages 
and R code should never be working with or even obtaining physical 
address of an object. This is also why one cannot obtain such address 
using base R (apart in textual form from certain diagnostic messages 
where it can indeed be useful for low-level debugging).

Tomas

On 09/02/2018 01:19 AM, D?nes T?th wrote:> The solution below introduces a dependency on data.table, but 
> otherwise it does what you need:
>
> ---
>
> # special method for Foo objects
> length.Foo <- function(x) {
> ? length(unlist(x, recursive = TRUE, use.names = FALSE))
> }
>
> # an instance of a Foo object
> x <- structure(list(a = 1, b = list(b1 = 1, b2 = 2)), class =
"Foo")
>
> # its length
> stopifnot(length(x) == 3L)
>
> # get its length as if it were a standard list
> .length <- function(x) {
> ? cls <- class(x)
> ? # setattr() does not make a copy, but modifies by reference
> ? data.table::setattr(x, "class", NULL)
> ? # get the length
> ? len <- base::length(x)
> ? # re-set original classes
> ? data.table::setattr(x, "class", cls)
> ? # return the unclassed length
> ? len
> }
>
> # to check that we do not make unwanted changes
> orig_class <- class(x)
>
> # check that the address in RAM does not change
> a1 <- data.table::address(x)
>
> # 'unclassed' length
> stopifnot(.length(x) == 2L)
>
> # check that address is the same
> stopifnot(a1 == data.table::address(x))
>
> # check against original class
> stopifnot(identical(orig_class, class(x)))
>
> ---
>
>
> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
>> Is there a low-level function that returns the length of an object
'x'
>> - the length that for instance .subset(x) and .subset2(x) see? An
>> obvious candidate would be to use:
>>
>> .length <- function(x) length(unclass(x))
>>
>> However, I'm concerned that calling unclass(x) may trigger an
>> expensive copy internally in some cases.? Is that concern unfounded?
>>
>> Thxs,
>>
>> Henrik
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Tomas Kalibera

2018-Sep-05 08:09 UTC

head link

[Rd] True length - length(unclass(x)) - without having to call unclass()?

On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:> Is there a low-level function that returns the length of an object
'x'
> - the length that for instance .subset(x) and .subset2(x) see? An
> obvious candidate would be to use:
>
> .length <- function(x) length(unclass(x))
>
> However, I'm concerned that calling unclass(x) may trigger an
> expensive copy internally in some cases.  Is that concern unfounded?Unclass() will always copy when "x" is really a variable, because the 
value in "x" will be referenced; whether it is prohibitively expensive
or not depends only on the workload - if "x" is a very long list and 
this functions is called often then it could, but at least to me this 
sounds unlikely. Unless you have a strong reason to believe it is the 
case I would just use length(unclass(x)).

If the copying is really a problem, I would think about why the 
underlying vector length is needed at R level - whether you really need 
to know the length without actually having the unclassed vector anyway 
for something else, so whether you are not paying for the copy anyway. 
Or, from the other end, if you need to do more without copying, and it 
is possible without breaking the value semantics, then you might need to 
switch to C anyway and for a bigger piece of code.

If it were still just .length() you needed and it were performance 
critical, you could just switch to C and call Rf_length. That does not 
violate the semantics, just indeed it is not elegant as you are 
switching to C.

If you stick to R and can live with the overhead of length(unclass(x)) 
then there is a chance the overhead will decrease as R is optimized 
internally. This is possible in principle when the runtime knows that 
the unclassed vector is only needed to compute something that does not 
modify the vector. The current R cannot optimize this out, but it should 
be possible with ALTREP at some point (and as Radford mentioned pqR does 
it differently). Even with such internal optimizations indeed it is 
often necessary to make guesses about realistic workloads, so if you 
have a realistic workload where say length(unclass(x)) is critical, you 
are more than welcome to donate it as benchmark.

Obviously, if you use a C version calling Rf_length, after such R 
optimization your code would be unnecessarily non-elegant, but would 
still work and probably without overhead, because R can't do much less 
than Rf_length. In more complicated cases though hand-optimized C code 
to implement say 2 operations in sequence could be slower than what 
better optimizing runtime could do by joining the effect of possibly 
more operations, which is in principle another danger of switching from 
R to C. But as far as the semantics is followed, there is no other danger.

The temptation should be small anyway in this case when Rf_length() 
would be the simplest, but as I made it more than clear in the previous 
email, one should never violate the value semantics by temporarily 
modifying the object (temporarily removing the class attribute or 
temporarily remove the object bit). Violating semantics causes bugs, if 
not with the present then with future versions of R (where version may 
be an svn revision). A concrete recent example: modifying objects in 
place in violation of the semantics caused a lot of bugs with 
introduction of unification of constants in the byte-code compiler.

Best
Tomas
>
> Thxs,
>
> Henrik
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Iñaki Ucar

2018-Sep-05 09:18 UTC

head link

[Rd] True length - length(unclass(x)) - without having to call unclass()?

The bottomline here is that one can always call a base method,
inexpensively and without modifying the object, in, let's say,
*formal* OOP languages. In R, this is not possible in general. It
would be possible if there was always a foo.default, but primitives
use internal dispatch.

I was wondering whether it would be possible to provide a super(x, n)
function which simply causes the dispatching system to avoid "n"
classes in the hierarchy, so that:
> x <- structure(list(), class=c("foo", "bar"))
> length(super(x, 0)) # looks for a length.foo
> length(super(x, 1)) # looks for a length.bar
> length(super(x, 2)) # calls the default
> length(super(x, Inf)) # calls the default
I?aki

El mi?., 5 sept. 2018 a las 10:09, Tomas Kalibera
(<tomas.kalibera at gmail.com>) escribi?:>
> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
> > Is there a low-level function that returns the length of an object
'x'
> > - the length that for instance .subset(x) and .subset2(x) see? An
> > obvious candidate would be to use:
> >
> > .length <- function(x) length(unclass(x))
> >
> > However, I'm concerned that calling unclass(x) may trigger an
> > expensive copy internally in some cases.  Is that concern unfounded?
> Unclass() will always copy when "x" is really a variable, because
the
> value in "x" will be referenced; whether it is prohibitively
expensive
> or not depends only on the workload - if "x" is a very long list
and
> this functions is called often then it could, but at least to me this
> sounds unlikely. Unless you have a strong reason to believe it is the
> case I would just use length(unclass(x)).
>
> If the copying is really a problem, I would think about why the
> underlying vector length is needed at R level - whether you really need
> to know the length without actually having the unclassed vector anyway
> for something else, so whether you are not paying for the copy anyway.
> Or, from the other end, if you need to do more without copying, and it
> is possible without breaking the value semantics, then you might need to
> switch to C anyway and for a bigger piece of code.
>
> If it were still just .length() you needed and it were performance
> critical, you could just switch to C and call Rf_length. That does not
> violate the semantics, just indeed it is not elegant as you are
> switching to C.
>
> If you stick to R and can live with the overhead of length(unclass(x))
> then there is a chance the overhead will decrease as R is optimized
> internally. This is possible in principle when the runtime knows that
> the unclassed vector is only needed to compute something that does not
> modify the vector. The current R cannot optimize this out, but it should
> be possible with ALTREP at some point (and as Radford mentioned pqR does
> it differently). Even with such internal optimizations indeed it is
> often necessary to make guesses about realistic workloads, so if you
> have a realistic workload where say length(unclass(x)) is critical, you
> are more than welcome to donate it as benchmark.
>
> Obviously, if you use a C version calling Rf_length, after such R
> optimization your code would be unnecessarily non-elegant, but would
> still work and probably without overhead, because R can't do much less
> than Rf_length. In more complicated cases though hand-optimized C code
> to implement say 2 operations in sequence could be slower than what
> better optimizing runtime could do by joining the effect of possibly
> more operations, which is in principle another danger of switching from
> R to C. But as far as the semantics is followed, there is no other danger.
>
> The temptation should be small anyway in this case when Rf_length()
> would be the simplest, but as I made it more than clear in the previous
> email, one should never violate the value semantics by temporarily
> modifying the object (temporarily removing the class attribute or
> temporarily remove the object bit). Violating semantics causes bugs, if
> not with the present then with future versions of R (where version may
> be an svn revision). A concrete recent example: modifying objects in
> place in violation of the semantics caused a lot of bugs with
> introduction of unification of constants in the byte-code compiler.
>
> Best
> Tomas
>
> >
> > Thxs,
> >
> > Henrik
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
I?aki Ucar

Reasonably Related Threads

Search for more maybe matching threads

R devel - Sep 2018 - True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

Reasonably Related Threads