Radford Neal
2016-Sep-08 21:11 UTC
[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays
Regarding Martin Maechler's proposal: Arithmetic between length-1 arrays and longer non-arrays had silently dropped the array attributes and recycled. This now gives a warning and will signal an error in the future, as it has always for logic and comparison operations For example, matrix(1,1,1) + (1:2) would give a warning/error. I think this might be a mistake. The potential benefits of this would be detection of some programming errors, and increased consistency. The downsides are breaking existing working programs, and decreased consistency. Regarding consistency, the overall R philosophy is that attaching an attribute to a vector doesn't change what you can do with it, or what the result is, except that the result (often) gets the attributes carried forward. By this logic, adding a 'dim' attribute shouldn't stop you from doing arithmetic (or comparisons) that you otherwise could. But maybe 'dim' attributes are special? Well, they are in some circumstances, and have to be when they are intended to change the behaviour, such as when a matrix is used as an index with [. But in many cases at present, 'dim' attributes DON'T stop you from treating the object as a plain vector - for example, one is allowed to do matrix(1:4,2,2)[3], and a<-numeric(10); a[2:5]<-matrix(1,2,2). So it may make more sense to move towards consistency in the permissive direction, rather than the restrictive direction. That would mean allowing matrix(1,1,1)<(1:2), and maybe also things like matrix(1,2,2)+(1:8). Obviously, a change that removes error conditions is much less likely to produce backwards-compatibility problems than a change that gives errors for previously-allowed operations. And I think there would be some significant problems. In addition to the 10-20+ packages that Martin expects to break, there could be quite a bit of user code that would no longer work - scripts for analysing data sets that used to work, but now don't with the latest version. There are reasons to expect such problems. Some operations such as vector dot products using %*% produce results that are always scalar, but are formed as 1x1 matrices. One can anticipate that many people have not been getting rid of the 'dim' attribute in such cases, when doing so hasn't been necessary in the past. Regarding the 0-length vector issue, I agree with other posters that after a<-numeric(0), is has to be allowable to write a<1. To not allow this would be highly destructive of code reliability. And for similar reason, after a<-c(), a<1 needs to be allowed, which means NULL<1 should be allowed (giving logical(0)), since c() is NULL. Radford Neal
Martin Maechler
2016-Sep-09 08:35 UTC
[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays
>>>>> Radford Neal <radford at cs.toronto.edu> >>>>> on Thu, 8 Sep 2016 17:11:18 -0400 writes:> Regarding Martin Maechler's proposal: > Arithmetic between length-1 arrays and longer non-arrays had > silently dropped the array attributes and recycled. This now gives > a warning and will signal an error in the future, as it has always > for logic and comparison operations > For example, matrix(1,1,1) + (1:2) would give a warning/error. > I think this might be a mistake. > The potential benefits of this would be detection of some programming > errors, and increased consistency. The downsides are breaking > existing working programs, and decreased consistency. > Regarding consistency, the overall R philosophy is that attaching an > attribute to a vector doesn't change what you can do with it, or what > the result is, except that the result (often) gets the attributes > carried forward. By this logic, adding a 'dim' attribute shouldn't > stop you from doing arithmetic (or comparisons) that you otherwise > could. Thank you, Radford, for joining in. The above is a good line of reasoning. > But maybe 'dim' attributes are special? Well, they are in some > circumstances, and have to be when they are intended to change the > behaviour, such as when a matrix is used as an index with [. indeed. > But in many cases at present, 'dim' attributes DON'T stop you from > treating the object as a plain vector - for example, one is allowed > to do matrix(1:4,2,2)[3], and a<-numeric(10); a[2:5]<-matrix(1,2,2). agreed. > So it may make more sense to move towards consistency in the > permissive direction, rather than the restrictive direction. > That would mean allowing matrix(1,1,1) < (1:2), and maybe also things > like matrix(1,2,2)+(1:8). That is an interesting idea. Yes, in my view that would definitely also have to allow the latter, by the above argument of not treating the dim/dimnames attributes special. For non-arrays length-1 is not treated much special apart from the fact that length-1 can always be recycled (without warning). > Obviously, a change that removes error conditions is much less likely > to produce backwards-compatibility problems than a change that gives > errors for previously-allowed operations. Of course that is true... and that has also been the reason for my amendment > And I think there would be some significant problems. In addition to > the 10-20+ packages that Martin expects to break, there could be quite > a bit of user code that would no longer work - scripts for analysing > data sets that used to work, but now don't with the latest version. That's not true (at least for the cases above): They would give a strong warning, "strong" because it is > matrix(1,1) + 1:2 [1] 2 3 Warning message: In matrix(1, 1) + 1:2 : dropping dim() of array of length one. Will become ERROR > *and* the logic and relop versions of this, e.g., matrix(TRUE,1) | c(TRUE,FALSE) ; matrix(1,1) > 1:2, have always been an error; so nothing would break there. > There are reasons to expect such problems. Some operations such as > vector dot products using %*% produce results that are always scalar, > but are formed as 1x1 matrices. Of course; that *was* the reason the very special treatment for arithmetic length-1 arrays had been introduced. It is convenient. However, *some* of the conveniences in S (and hence R) functions have been dangerous {and much more used, hence close to impossible to abolish, e.g., sample(x) when x is numeric of length 1, and several others, you'll find in the "R Inferno"}, or at least quirky for *programming* with R (as opposed to pure interactive use). > One can anticipate that many people > have not been getting rid of the 'dim' attribute in such cases, when > doing so hasn't been necessary in the past. If it remains at 10-20 CRAN packages (out of 9000), each with just very few instances, that would indicate I think not so wide spread use. Note that they only did not have to get rid of the dim() in the length-1 case (and only for arithmetic): as soon as they had another dimension, they would have got an error. Still, I agree about the validity of your line of thought, and that in order to get consistency we also could go into the direction of being more permissive rather than restrictive. I'm interested to hear other opinions notably as in recent years, some famous R teachers have typically critized R are as being not strict enough ... > Regarding the 0-length vector issue, I agree with other posters that > after a<-numeric(0), is has to be allowable to write a<1. To not > allow this would be highly destructive of code reliability. And for > similar reason, after a<-c(), a<1 needs to be allowed, which means > NULL<1 should be allowed (giving logical(0)), since c() is > NULL. Yes, indeed, treating NULL the same as a length-0 atomic vector here is also correct in my view, and maybe the fact you mention that c() is NULL does help to convince others. Martin
Radford Neal
2016-Sep-09 14:29 UTC
[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays
> Radford Nea: > > So it may make more sense to move towards consistency in the > > permissive direction, rather than the restrictive direction. > > > That would mean allowing matrix(1,1,1) < (1:2), and maybe also things > > like matrix(1,2,2)+(1:8). > > Martin Maechler: > That is an interesting idea. Yes, in my view that would > definitely also have to allow the latter, by the above argument > of not treating the dim/dimnames attributes special. For > non-arrays length-1 is not treated much special apart from the > fact that length-1 can always be recycled (without warning).I think one could argue for allowing matrix(1,1,1)+(1:8) but not matrix(1,2,2)+(1:8). Length-1 vectors certainly are special in some circumstances, being R's only way of representing a scalar. For instance, if (c(T,F)) gives a warning. This really goes back to what I think may have been a basic mistake in the design of S, in deciding that everything is a vector, then halfway modifying this with dim attributes, but it's too late to totally undo that (though allowing a 0-length dim attribute to explicitly mark a length-1 vector as a scalar might help).> > And I think there would be some significant problems. In addition to > > the 10-20+ packages that Martin expects to break, there could be quite > > a bit of user code that would no longer work - scripts for analysing > > data sets that used to work, but now don't with the latest version. > > That's not true (at least for the cases above): They would give > a strong warningBut isn't the intent to make it an error later? So I assume we're debating making it an error, not just a warning. (Though I'm generally opposed to such warnings anyway, unless they could somehow be restricted to come up only for interactive uses, not from deep in a program the user didn't write, making them totally mysterious...)> *and* the logic and relop versions of this, e.g., > matrix(TRUE,1) | c(TRUE,FALSE) ; matrix(1,1) > 1:2, > have always been an error; so nothing would break there.Yes, that wouldn't change the behaviour of old code, but if we're aiming for consistencey, it might make sense to get rid of that error, allowing code like sum(a%*%b<c(10,20,30)) with a and b being vectors, rather than forcing the programmer to write sum(c(a%*%b)<c(10,20,30)).> Of course; that *was* the reason the very special treatment for arithmetic > length-1 arrays had been introduced. It is convenient. > > However, *some* of the conveniences in S (and hence R) functions > have been dangerous {and much more used, hence close to > impossible to abolish, e.g., sample(x) when x is numeric of length 1,There's a difference between these two. Giving an error when using a 1x1 matrix as a scalar may detect some programming bugs, but not giving an error doesn't introduce a bug. Whereas sample(2:n) behaving differently when n is 2 than when n is greater than 2 is itself a bug, that the programmer has to consciously avoid by being aware of the quirk. Radford Neal
Possibly Parallel Threads
- R (development) changes in arith, logic, relop with (0-extent) arrays
- R (development) changes in arith, logic, relop with (0-extent) arrays
- R (development) changes in arith, logic, relop with (0-extent) arrays
- R (development) changes in arith, logic, relop with (0-extent) arrays
- R (development) changes in arith, logic, relop with (0-extent) arrays