thr3ads.net - R devel - [Rd] True length - length(unclass(x)) - without having to call unclass()? [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Iñaki Ucar

2018-Sep-05 09:18 UTC

[Rd] True length - length(unclass(x)) - without having to call unclass()?

The bottomline here is that one can always call a base method,
inexpensively and without modifying the object, in, let's say,
*formal* OOP languages. In R, this is not possible in general. It
would be possible if there was always a foo.default, but primitives
use internal dispatch.

I was wondering whether it would be possible to provide a super(x, n)
function which simply causes the dispatching system to avoid "n"
classes in the hierarchy, so that:
> x <- structure(list(), class=c("foo", "bar"))
> length(super(x, 0)) # looks for a length.foo
> length(super(x, 1)) # looks for a length.bar
> length(super(x, 2)) # calls the default
> length(super(x, Inf)) # calls the default
I?aki

El mi?., 5 sept. 2018 a las 10:09, Tomas Kalibera
(<tomas.kalibera at gmail.com>) escribi?:>
> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
> > Is there a low-level function that returns the length of an object
'x'
> > - the length that for instance .subset(x) and .subset2(x) see? An
> > obvious candidate would be to use:
> >
> > .length <- function(x) length(unclass(x))
> >
> > However, I'm concerned that calling unclass(x) may trigger an
> > expensive copy internally in some cases.  Is that concern unfounded?
> Unclass() will always copy when "x" is really a variable, because
the
> value in "x" will be referenced; whether it is prohibitively
expensive
> or not depends only on the workload - if "x" is a very long list
and
> this functions is called often then it could, but at least to me this
> sounds unlikely. Unless you have a strong reason to believe it is the
> case I would just use length(unclass(x)).
>
> If the copying is really a problem, I would think about why the
> underlying vector length is needed at R level - whether you really need
> to know the length without actually having the unclassed vector anyway
> for something else, so whether you are not paying for the copy anyway.
> Or, from the other end, if you need to do more without copying, and it
> is possible without breaking the value semantics, then you might need to
> switch to C anyway and for a bigger piece of code.
>
> If it were still just .length() you needed and it were performance
> critical, you could just switch to C and call Rf_length. That does not
> violate the semantics, just indeed it is not elegant as you are
> switching to C.
>
> If you stick to R and can live with the overhead of length(unclass(x))
> then there is a chance the overhead will decrease as R is optimized
> internally. This is possible in principle when the runtime knows that
> the unclassed vector is only needed to compute something that does not
> modify the vector. The current R cannot optimize this out, but it should
> be possible with ALTREP at some point (and as Radford mentioned pqR does
> it differently). Even with such internal optimizations indeed it is
> often necessary to make guesses about realistic workloads, so if you
> have a realistic workload where say length(unclass(x)) is critical, you
> are more than welcome to donate it as benchmark.
>
> Obviously, if you use a C version calling Rf_length, after such R
> optimization your code would be unnecessarily non-elegant, but would
> still work and probably without overhead, because R can't do much less
> than Rf_length. In more complicated cases though hand-optimized C code
> to implement say 2 operations in sequence could be slower than what
> better optimizing runtime could do by joining the effect of possibly
> more operations, which is in principle another danger of switching from
> R to C. But as far as the semantics is followed, there is no other danger.
>
> The temptation should be small anyway in this case when Rf_length()
> would be the simplest, but as I made it more than clear in the previous
> email, one should never violate the value semantics by temporarily
> modifying the object (temporarily removing the class attribute or
> temporarily remove the object bit). Violating semantics causes bugs, if
> not with the present then with future versions of R (where version may
> be an svn revision). A concrete recent example: modifying objects in
> place in violation of the semantics caused a lot of bugs with
> introduction of unification of constants in the byte-code compiler.
>
> Best
> Tomas
>
> >
> > Thxs,
> >
> > Henrik
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
I?aki Ucar

Kevin Ushey

2018-Sep-05 16:30 UTC

head link

[Rd] True length - length(unclass(x)) - without having to call unclass()?

More generally, I think one of the issues is that R is not yet able to
decrement a reference count (or mark a 'shared' data object as
'unshared' after it knows only one binding to it exists). This means
passing variables to R closures will mark that object as shared:

    x <- list()
    .Internal(inspect(x))  # NAM(1)
    identity(x)
    .Internal(inspect(x))  # NAM(3)

I think for this reason users often resort to 'hacks' that involve
directly setting attributes on the object, since they 'know' only one
reference to a particular object exists. I'm not sure if this really
is 'safe', though -- likely not given potential future optimizations
to R, as Tomas has alluded to.

I think true reference counting has been implemented in the R sources,
but the switch has not yet been flipped to enable that by default.
Hopefully having that will make cases like the above work as expected?

Thanks,
Kevin

On Wed, Sep 5, 2018 at 2:19 AM I?aki Ucar <iucar at fedoraproject.org>
wrote:>
> The bottomline here is that one can always call a base method,
> inexpensively and without modifying the object, in, let's say,
> *formal* OOP languages. In R, this is not possible in general. It
> would be possible if there was always a foo.default, but primitives
> use internal dispatch.
>
> I was wondering whether it would be possible to provide a super(x, n)
> function which simply causes the dispatching system to avoid "n"
> classes in the hierarchy, so that:
>
> > x <- structure(list(), class=c("foo", "bar"))
> > length(super(x, 0)) # looks for a length.foo
> > length(super(x, 1)) # looks for a length.bar
> > length(super(x, 2)) # calls the default
> > length(super(x, Inf)) # calls the default
>
> I?aki
>

luke-tier@ey m@ili@g off uiow@@edu

2018-Sep-05 21:38 UTC

head link

[Rd] True length - length(unclass(x)) - without having to call unclass()?

On Wed, 5 Sep 2018, Kevin Ushey wrote:
> More generally, I think one of the issues is that R is not yet able to
> decrement a reference count (or mark a 'shared' data object as
> 'unshared' after it knows only one binding to it exists). This
means
> passing variables to R closures will mark that object as shared:
>
>    x <- list()
>    .Internal(inspect(x))  # NAM(1)
>    identity(x)
>    .Internal(inspect(x))  # NAM(3)
>
> I think for this reason users often resort to 'hacks' that involve
> directly setting attributes on the object, since they 'know' only
one
> reference to a particular object exists. I'm not sure if this really
> is 'safe', though -- likely not given potential future
optimizations
> to R, as Tomas has alluded to.
>
> I think true reference counting has been implemented in the R sources,
> but the switch has not yet been flipped to enable that by default.
> Hopefully having that will make cases like the above work as expected?
Current R-devel built with reference counting by setting

CFLAGS="-O3 -g -Wall -pedantic -DSWITCH_TO_REFCNT"

gives


x <- list()
.Internal(inspect(x))
## @55ad788e3b28 19 VECSXP g0c0 [REF(1)] (len=0, tl=0)
identity(x)
## list()
.Internal(inspect(x))
## @55ad788e3b28 19 VECSXP g0c0 [REF(1)] (len=0, tl=0)

I'm moderately hopeful we'll be able to switch to this for 3.6.0 but
depends on finding enough time to sort out some loose ends.

Best,

luke
>
> Thanks,
> Kevin
>
> On Wed, Sep 5, 2018 at 2:19 AM I?aki Ucar <iucar at
fedoraproject.org> wrote:
>>
>> The bottomline here is that one can always call a base method,
>> inexpensively and without modifying the object, in, let's say,
>> *formal* OOP languages. In R, this is not possible in general. It
>> would be possible if there was always a foo.default, but primitives
>> use internal dispatch.
>>
>> I was wondering whether it would be possible to provide a super(x, n)
>> function which simply causes the dispatching system to avoid
"n"
>> classes in the hierarchy, so that:
>>
>>> x <- structure(list(), class=c("foo",
"bar"))
>>> length(super(x, 0)) # looks for a length.foo
>>> length(super(x, 1)) # looks for a length.bar
>>> length(super(x, 2)) # calls the default
>>> length(super(x, Inf)) # calls the default
>>
>> I?aki
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Tomas Kalibera

2018-Sep-10 12:18 UTC

head link

[Rd] True length - length(unclass(x)) - without having to call unclass()?

On 09/05/2018 11:18 AM, I?aki Ucar wrote:> The bottomline here is that one can always call a base method,
> inexpensively and without modifying the object, in, let's say,
> *formal* OOP languages. In R, this is not possible in general. It
> would be possible if there was always a foo.default, but primitives
> use internal dispatch.
>
> I was wondering whether it would be possible to provide a super(x, n)
> function which simply causes the dispatching system to avoid "n"
> classes in the hierarchy, so that:
>
>> x <- structure(list(), class=c("foo", "bar"))
>> length(super(x, 0)) # looks for a length.foo
>> length(super(x, 1)) # looks for a length.bar
>> length(super(x, 2)) # calls the default
>> length(super(x, Inf)) # calls the defaultI think that a cast should always to be for a specific class, defined by 
the name of the class. Identifying classes by their inheritance index 
might be unnecessarily brittle - it would break if someone introduced a 
new ancestor class. Apart from the syntax - supporting fast casts for S3 
dispatch in the current implementation would be quite a bit of work, 
probably not worth it, also it would probably slow down the internal 
dispatch in primitives. But a partial solution could be implemented at 
some point with ALTREP wrappers when one could without copying create a 
wrapper object with a modified class attribute.

Tomas> I?aki
>
> El mi?., 5 sept. 2018 a las 10:09, Tomas Kalibera
> (<tomas.kalibera at gmail.com>) escribi?:
>> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
>>> Is there a low-level function that returns the length of an object
'x'
>>> - the length that for instance .subset(x) and .subset2(x) see? An
>>> obvious candidate would be to use:
>>>
>>> .length <- function(x) length(unclass(x))
>>>
>>> However, I'm concerned that calling unclass(x) may trigger an
>>> expensive copy internally in some cases.  Is that concern
unfounded?
>> Unclass() will always copy when "x" is really a variable,
because the
>> value in "x" will be referenced; whether it is prohibitively
expensive
>> or not depends only on the workload - if "x" is a very long
list and
>> this functions is called often then it could, but at least to me this
>> sounds unlikely. Unless you have a strong reason to believe it is the
>> case I would just use length(unclass(x)).
>>
>> If the copying is really a problem, I would think about why the
>> underlying vector length is needed at R level - whether you really need
>> to know the length without actually having the unclassed vector anyway
>> for something else, so whether you are not paying for the copy anyway.
>> Or, from the other end, if you need to do more without copying, and it
>> is possible without breaking the value semantics, then you might need
to
>> switch to C anyway and for a bigger piece of code.
>>
>> If it were still just .length() you needed and it were performance
>> critical, you could just switch to C and call Rf_length. That does not
>> violate the semantics, just indeed it is not elegant as you are
>> switching to C.
>>
>> If you stick to R and can live with the overhead of length(unclass(x))
>> then there is a chance the overhead will decrease as R is optimized
>> internally. This is possible in principle when the runtime knows that
>> the unclassed vector is only needed to compute something that does not
>> modify the vector. The current R cannot optimize this out, but it
should
>> be possible with ALTREP at some point (and as Radford mentioned pqR
does
>> it differently). Even with such internal optimizations indeed it is
>> often necessary to make guesses about realistic workloads, so if you
>> have a realistic workload where say length(unclass(x)) is critical, you
>> are more than welcome to donate it as benchmark.
>>
>> Obviously, if you use a C version calling Rf_length, after such R
>> optimization your code would be unnecessarily non-elegant, but would
>> still work and probably without overhead, because R can't do much
less
>> than Rf_length. In more complicated cases though hand-optimized C code
>> to implement say 2 operations in sequence could be slower than what
>> better optimizing runtime could do by joining the effect of possibly
>> more operations, which is in principle another danger of switching from
>> R to C. But as far as the semantics is followed, there is no other
danger.
>>
>> The temptation should be small anyway in this case when Rf_length()
>> would be the simplest, but as I made it more than clear in the previous
>> email, one should never violate the value semantics by temporarily
>> modifying the object (temporarily removing the class attribute or
>> temporarily remove the object bit). Violating semantics causes bugs, if
>> not with the present then with future versions of R (where version may
>> be an svn revision). A concrete recent example: modifying objects in
>> place in violation of the semantics caused a lot of bugs with
>> introduction of unification of constants in the byte-code compiler.
>>
>> Best
>> Tomas
>>
>>> Thxs,
>>>
>>> Henrik
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

Iñaki Ucar

2018-Sep-10 12:30 UTC

head link

[Rd] True length - length(unclass(x)) - without having to call unclass()?

El lun., 10 sept. 2018 a las 14:18, Tomas Kalibera
(<tomas.kalibera at gmail.com>) escribi?:>
> On 09/05/2018 11:18 AM, I?aki Ucar wrote:
> > The bottomline here is that one can always call a base method,
> > inexpensively and without modifying the object, in, let's say,
> > *formal* OOP languages. In R, this is not possible in general. It
> > would be possible if there was always a foo.default, but primitives
> > use internal dispatch.
> >
> > I was wondering whether it would be possible to provide a super(x, n)
> > function which simply causes the dispatching system to avoid
"n"
> > classes in the hierarchy, so that:
> >
> >> x <- structure(list(), class=c("foo",
"bar"))
> >> length(super(x, 0)) # looks for a length.foo
> >> length(super(x, 1)) # looks for a length.bar
> >> length(super(x, 2)) # calls the default
> >> length(super(x, Inf)) # calls the default
> I think that a cast should always to be for a specific class, defined by
> the name of the class. Identifying classes by their inheritance index
> might be unnecessarily brittle - it would break if someone introduced a
> new ancestor class.
Agree. But just wanted to point out that, then, something like
super(x, "default") should always work to point to default methods,
even if a method is internal and there's no foo.default defined.
Otherwise, we would have the same problem.

I?aki
> Apart from the syntax - supporting fast casts for S3
> dispatch in the current implementation would be quite a bit of work,
> probably not worth it, also it would probably slow down the internal
> dispatch in primitives. But a partial solution could be implemented at
> some point with ALTREP wrappers when one could without copying create a
> wrapper object with a modified class attribute.
>
> Tomas
> > I?aki
> >
> > El mi?., 5 sept. 2018 a las 10:09, Tomas Kalibera
> > (<tomas.kalibera at gmail.com>) escribi?:
> >> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
> >>> Is there a low-level function that returns the length of an
object 'x'
> >>> - the length that for instance .subset(x) and .subset2(x) see?
An
> >>> obvious candidate would be to use:
> >>>
> >>> .length <- function(x) length(unclass(x))
> >>>
> >>> However, I'm concerned that calling unclass(x) may trigger
an
> >>> expensive copy internally in some cases.  Is that concern
unfounded?
> >> Unclass() will always copy when "x" is really a
variable, because the
> >> value in "x" will be referenced; whether it is
prohibitively expensive
> >> or not depends only on the workload - if "x" is a very
long list and
> >> this functions is called often then it could, but at least to me
this
> >> sounds unlikely. Unless you have a strong reason to believe it is
the
> >> case I would just use length(unclass(x)).
> >>
> >> If the copying is really a problem, I would think about why the
> >> underlying vector length is needed at R level - whether you really
need
> >> to know the length without actually having the unclassed vector
anyway
> >> for something else, so whether you are not paying for the copy
anyway.
> >> Or, from the other end, if you need to do more without copying,
and it
> >> is possible without breaking the value semantics, then you might
need to
> >> switch to C anyway and for a bigger piece of code.
> >>
> >> If it were still just .length() you needed and it were performance
> >> critical, you could just switch to C and call Rf_length. That does
not
> >> violate the semantics, just indeed it is not elegant as you are
> >> switching to C.
> >>
> >> If you stick to R and can live with the overhead of
length(unclass(x))
> >> then there is a chance the overhead will decrease as R is
optimized
> >> internally. This is possible in principle when the runtime knows
that
> >> the unclassed vector is only needed to compute something that does
not
> >> modify the vector. The current R cannot optimize this out, but it
should
> >> be possible with ALTREP at some point (and as Radford mentioned
pqR does
> >> it differently). Even with such internal optimizations indeed it
is
> >> often necessary to make guesses about realistic workloads, so if
you
> >> have a realistic workload where say length(unclass(x)) is
critical, you
> >> are more than welcome to donate it as benchmark.
> >>
> >> Obviously, if you use a C version calling Rf_length, after such R
> >> optimization your code would be unnecessarily non-elegant, but
would
> >> still work and probably without overhead, because R can't do
much less
> >> than Rf_length. In more complicated cases though hand-optimized C
code
> >> to implement say 2 operations in sequence could be slower than
what
> >> better optimizing runtime could do by joining the effect of
possibly
> >> more operations, which is in principle another danger of switching
from
> >> R to C. But as far as the semantics is followed, there is no other
danger.
> >>
> >> The temptation should be small anyway in this case when
Rf_length()
> >> would be the simplest, but as I made it more than clear in the
previous
> >> email, one should never violate the value semantics by temporarily
> >> modifying the object (temporarily removing the class attribute or
> >> temporarily remove the object bit). Violating semantics causes
bugs, if
> >> not with the present then with future versions of R (where version
may
> >> be an svn revision). A concrete recent example: modifying objects
in
> >> place in violation of the semantics caused a lot of bugs with
> >> introduction of unification of constants in the byte-code
compiler.
> >>
> >> Best
> >> Tomas
> >>
> >>> Thxs,
> >>>
> >>> Henrik
> >>>

Apparently Analagous Threads

Search for more apparently analagous threads

R devel - Sep 2018 - True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

[Rd] True length - length(unclass(x)) - without having to call unclass()?

Apparently Analagous Threads