thr3ads.net - R devel - [Rd] R (development) changes in arith, logic, relop with (0-extent) arrays [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Radford Neal

2016-Sep-08 21:11 UTC

[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

Regarding Martin Maechler's proposal:

   Arithmetic between length-1 arrays and longer non-arrays had
   silently dropped the array attributes and recycled.  This now gives
   a warning and will signal an error in the future, as it has always
   for logic and comparison operations

For example, matrix(1,1,1) + (1:2) would give a warning/error.

I think this might be a mistake.

The potential benefits of this would be detection of some programming
errors, and increased consistency.  The downsides are breaking
existing working programs, and decreased consistency.

Regarding consistency, the overall R philosophy is that attaching an
attribute to a vector doesn't change what you can do with it, or what
the result is, except that the result (often) gets the attributes
carried forward.  By this logic, adding a 'dim' attribute shouldn't
stop you from doing arithmetic (or comparisons) that you otherwise
could.

But maybe 'dim' attributes are special?  Well, they are in some
circumstances, and have to be when they are intended to change the
behaviour, such as when a matrix is used as an index with [.

But in many cases at present, 'dim' attributes DON'T stop you from
treating the object as a plain vector - for example, one is allowed 
to do matrix(1:4,2,2)[3], and a<-numeric(10); a[2:5]<-matrix(1,2,2).

So it may make more sense to move towards consistency in the
permissive direction, rather than the restrictive direction.  That
would mean allowing matrix(1,1,1)<(1:2), and maybe also things
like matrix(1,2,2)+(1:8).

Obviously, a change that removes error conditions is much less likely
to produce backwards-compatibility problems than a change that gives
errors for previously-allowed operations.

And I think there would be some significant problems. In addition to
the 10-20+ packages that Martin expects to break, there could be quite
a bit of user code that would no longer work - scripts for analysing
data sets that used to work, but now don't with the latest version.

There are reasons to expect such problems.  Some operations such as
vector dot products using %*% produce results that are always scalar,
but are formed as 1x1 matrices.  One can anticipate that many people
have not been getting rid of the 'dim' attribute in such cases, when
doing so hasn't been necessary in the past.

Regarding the 0-length vector issue, I agree with other posters that
after a<-numeric(0), is has to be allowable to write a<1.  To not
allow this would be highly destructive of code reliability.  And for
similar reason, after a<-c(), a<1 needs to be allowed, which means
NULL<1 should be allowed (giving logical(0)), since c() is NULL.

   Radford Neal

Martin Maechler

2016-Sep-09 08:35 UTC

head link

[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

>>>>> Radford Neal <radford at cs.toronto.edu>
>>>>>     on Thu, 8 Sep 2016 17:11:18 -0400 writes:
    > Regarding Martin Maechler's proposal:
    > Arithmetic between length-1 arrays and longer non-arrays had
    > silently dropped the array attributes and recycled.  This now gives
    > a warning and will signal an error in the future, as it has always
    > for logic and comparison operations

    > For example, matrix(1,1,1) + (1:2) would give a warning/error.

    > I think this might be a mistake.

    > The potential benefits of this would be detection of some programming
    > errors, and increased consistency.  The downsides are breaking
    > existing working programs, and decreased consistency.

    > Regarding consistency, the overall R philosophy is that attaching an
    > attribute to a vector doesn't change what you can do with it, or
what
    > the result is, except that the result (often) gets the attributes
    > carried forward.  By this logic, adding a 'dim' attribute
shouldn't
    > stop you from doing arithmetic (or comparisons) that you otherwise
    > could.

Thank you, Radford, for joining in.
The above is a good line of reasoning.

    > But maybe 'dim' attributes are special?  Well, they are in some
    > circumstances, and have to be when they are intended to change the
    > behaviour, such as when a matrix is used as an index with [.

indeed.

    > But in many cases at present, 'dim' attributes DON'T stop
you from
    > treating the object as a plain vector - for example, one is allowed 
    > to do matrix(1:4,2,2)[3], and a<-numeric(10);
a[2:5]<-matrix(1,2,2).

agreed.

    > So it may make more sense to move towards consistency in the
    > permissive direction, rather than the restrictive direction.

    > That would mean allowing matrix(1,1,1) < (1:2), and maybe also
things
    > like matrix(1,2,2)+(1:8).

That is an interesting idea.  Yes, in my view that would
definitely also have to allow the latter, by the above argument
of not treating the dim/dimnames attributes special.  For
non-arrays length-1 is not treated much special apart from the
fact that length-1 can always be recycled (without warning).

    > Obviously, a change that removes error conditions is much less likely
    > to produce backwards-compatibility problems than a change that gives
    > errors for previously-allowed operations.

Of course that is true... and that has also been the reason for
my amendment

    > And I think there would be some significant problems. In addition to
    > the 10-20+ packages that Martin expects to break, there could be quite
    > a bit of user code that would no longer work - scripts for analysing
    > data sets that used to work, but now don't with the latest version.

That's not true (at least for the cases above): They would give
a strong warning, "strong" because it is

   > matrix(1,1) + 1:2
   [1] 2 3
   Warning message:
   In matrix(1, 1) + 1:2 :
     dropping dim() of array of length one.  Will become ERROR
   > 

*and* the  logic and relop versions of this, e.g.,
   matrix(TRUE,1) | c(TRUE,FALSE) ;  matrix(1,1) > 1:2,
have always been an  error; so nothing would break there.

    > There are reasons to expect such problems.  Some operations such as
    > vector dot products using %*% produce results that are always scalar,
    > but are formed as 1x1 matrices.

Of course; that *was* the reason the very special treatment for arithmetic
length-1 arrays had been introduced.  It is convenient.

However, *some* of the conveniences in S (and hence R) functions
have been dangerous {and much more used, hence close to
impossible to abolish, e.g., sample(x) when x  is numeric of length 1,
and several others, you'll find in the "R Inferno"}, or at least
quirky for *programming* with R (as opposed to pure interactive use).

    > One can anticipate that many people
    > have not been getting rid of the 'dim' attribute in such cases,
when
    > doing so hasn't been necessary in the past.

If it remains at 10-20 CRAN packages (out of 9000), each with
just very few instances, that would indicate I think not so wide
spread use.
Note that they only did not have to get rid of the dim() in the
length-1 case (and only for arithmetic): as soon as they had
another dimension, they would have got an error.

Still, I agree about the validity of your line of thought, and
that in order to get consistency we also could go into the
direction of being more permissive rather than restrictive.

I'm interested to hear other opinions notably as in recent years,
some famous R teachers have typically critized R are as being
not strict enough ...

    > Regarding the 0-length vector issue, I agree with other posters that
    > after a<-numeric(0), is has to be allowable to write a<1.  To not
    > allow this would be highly destructive of code reliability.  And for
    > similar reason, after a<-c(), a<1 needs to be allowed, which
means
    > NULL<1 should be allowed (giving logical(0)), since c() is
    > NULL.

Yes, indeed, treating NULL the same as a length-0 atomic
vector here is also correct in my view, and maybe the fact you
mention that c() is NULL  does help to convince others.

Martin

Radford Neal

2016-Sep-09 14:29 UTC

head link

[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

>     Radford Nea:
>     > So it may make more sense to move towards consistency in the
>     > permissive direction, rather than the restrictive direction.
> 
>     > That would mean allowing matrix(1,1,1) < (1:2), and maybe also
things
>     > like matrix(1,2,2)+(1:8).
> 
> Martin Maechler:
> That is an interesting idea.  Yes, in my view that would
> definitely also have to allow the latter, by the above argument
> of not treating the dim/dimnames attributes special.  For
> non-arrays length-1 is not treated much special apart from the
> fact that length-1 can always be recycled (without warning).
I think one could argue for allowing matrix(1,1,1)+(1:8) but not
matrix(1,2,2)+(1:8).  Length-1 vectors certainly are special in some
circumstances, being R's only way of representing a scalar.  For
instance, if (c(T,F)) gives a warning.

This really goes back to what I think may have been a basic mistake in
the design of S, in deciding that everything is a vector, then halfway
modifying this with dim attributes, but it's too late to totally undo
that (though allowing a 0-length dim attribute to explicitly mark a
length-1 vector as a scalar might help).
>     > And I think there would be some significant problems. In addition
to
>     > the 10-20+ packages that Martin expects to break, there could be
quite
>     > a bit of user code that would no longer work - scripts for
analysing
>     > data sets that used to work, but now don't with the latest
version.
> 
> That's not true (at least for the cases above): They would give
> a strong warning
But isn't the intent to make it an error later?  So I assume we're
debating making it an error, not just a warning.  (Though I'm
generally opposed to such warnings anyway, unless they could somehow
be restricted to come up only for interactive uses, not from deep in a
program the user didn't write, making them totally mysterious...)
> *and* the  logic and relop versions of this, e.g.,
>    matrix(TRUE,1) | c(TRUE,FALSE) ;  matrix(1,1) > 1:2,
> have always been an  error; so nothing would break there.
Yes, that wouldn't change the behaviour of old code, but if we're
aiming for consistencey, it might make sense to get rid of that error,
allowing code like sum(a%*%b<c(10,20,30)) with a and b being vectors,
rather than forcing the programmer to write sum(c(a%*%b)<c(10,20,30)).
> Of course; that *was* the reason the very special treatment for arithmetic
> length-1 arrays had been introduced.  It is convenient.
> 
> However, *some* of the conveniences in S (and hence R) functions
> have been dangerous {and much more used, hence close to
> impossible to abolish, e.g., sample(x) when x  is numeric of length 1,
There's a difference between these two.  Giving an error when using a
1x1 matrix as a scalar may detect some programming bugs, but not
giving an error doesn't introduce a bug.  Whereas sample(2:n) behaving
differently when n is 2 than when n is greater than 2 is itself a bug,
that the programmer has to consciously avoid by being aware of the quirk.

   Radford Neal

Reasonably Related Threads

Search for more seemingly similar threads

R devel - Sep 2016 - R (development) changes in arith, logic, relop with (0-extent) arrays

[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

Reasonably Related Threads