thr3ads.net - R devel - [Rd] Shouldn't vector indexing with negative out-of-range index give an error? [May 2015]

If this information is useful, please help other people find it:
Share via:

Henrik Bengtsson

2015-May-04 19:20 UTC

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

In Section 'Indexing by vectors' of 'R Language Definition'
(http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors)
it says:

"Integer. All elements of i must have the same sign. If they are
positive, the elements of x with those index numbers are selected. If
i contains negative elements, all elements except those indicated are
selected.

If i is positive and exceeds length(x) then the corresponding
selection is NA. A negative out of bounds value for i causes an error.

A special case is the zero index, which has null effects: x[0] is an
empty vector and otherwise including zeros among positive or negative
indices has the same effect as if they were omitted."

However, that "A negative out of bounds value for i causes an error"
in the second paragraph does not seem to apply.  Instead, R silently
ignore negative indices that are out of range.  For example:
> x <- 1:4
> x[-9L]
[1] 1 2 3 4> x[-c(1:9)]
integer(0)> x[-c(3:9)][1] 1 2
> y <- as.list(1:4)
> y[-c(1:9)]list()

Is the observed non-error the correct behavior and therefore the
documentation is incorrect, or is it vice verse?  (...or is it me
missing something)

I get the above on R devel, R 3.2.0, and as far back as R 2.11.0
(haven't check earlier versions).

Thank you,

Henrik

Martin Maechler

2015-May-05 14:01 UTC

head link

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

>>>>> Henrik Bengtsson <henrik.bengtsson at ucsf.edu>
>>>>>     on Mon, 4 May 2015 12:20:44 -0700 writes:
    > In Section 'Indexing by vectors' of 'R Language
Definition'
    >
(http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors)
    > it says:

    > "Integer. All elements of i must have the same sign. If they are
    > positive, the elements of x with those index numbers are selected. If
    > i contains negative elements, all elements except those indicated are
    > selected.

    > If i is positive and exceeds length(x) then the corresponding
    > selection is NA. A negative out of bounds value for i causes an error.

    > A special case is the zero index, which has null effects: x[0] is an
    > empty vector and otherwise including zeros among positive or negative
    > indices has the same effect as if they were omitted."

    > However, that "A negative out of bounds value for i causes an
error"
    > in the second paragraph does not seem to apply.  Instead, R silently
    > ignore negative indices that are out of range.  For example:

    >> x <- 1:4
    >> x[-9L]
    > [1] 1 2 3 4
    >> x[-c(1:9)]
    > integer(0)
    >> x[-c(3:9)]
    > [1] 1 2

    >> y <- as.list(1:4)
    >> y[-c(1:9)]
    > list()

    > Is the observed non-error the correct behavior and therefore the
    > documentation is incorrect, or is it vice verse?  (...or is it me
    > missing something)

    > I get the above on R devel, R 3.2.0, and as far back as R 2.11.0
    > (haven't check earlier versions).

Thank you, Henrik!

I've checked further back: The change happened between R 2.5.1 and R 2.6.0.

The previous behavior was

  > (1:3)[-(3:5)]
  Error: subscript out of bounds

If you start reading NEWS.2, you see a *lot* of new features
(and bug fixes) in the 2.6.0 news, but from my browsing, none of
them mentioned the new behavior as feature.

Let's -- for a moment -- declare it a bug in the code, i.e., not
in the documentation:

- As 2.6.0  happened quite a while ago (Oct. 2007),  
  we could wonder how much R code will break if we fix the bug.

- Is the R package authors' community willing to do the necessary
  cleanup in their packages ?

---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 


Now, after reading the source code for a while, and looking at
the changes, I've found the log entry

------------------------------------------------------------------------
r42123 | ihaka | 2007-07-05 02:00:05 +0200 (Thu, 05 Jul 2007) | 4 lines

Changed the behaviour of out-of-bounds negative
subscripts to match that of S.  Such values are
now ignored rather than tripping an error.

------------------------------------------------------------------------

So, it was changed on purpose, by one of the true "R"s, very
much on purpose.

Making it a *warning* instead of the original error
may have been both more cautious and more helpful for
detecting programming errors.

OTOH, John Chambers, the father of S and hence grandfather of R,
may have had good reasons why it seemed more logical to silently
ignore such out of bound negative indices:
One could argue that

   x[-5]  means  "leave away the 5-th element of x"

and if there is no 5-th element of x, leaving it away should be a no-op.

After all this musing and history detection, my gut decision
would be to only change the documentation which Ross forgot to change.

But of course, it may be interesting to hear other programmeR's feedback on
this.

Martin

John Chambers

2015-May-05 15:45 UTC

head link

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

When someone suggests that we "might have had a reason" for some
peculiarity in the original S, my usual reaction is "Or else we never
thought of the problem".

In this case, however, there is a relevant statement in the 1988 "blue
book".  In the discussion of subscripting (p 358) the definition for
negative i says: "the indices consist of the elements of seq(along=x) that
do not match any elements in -i".

Suggesting that no bounds checking on -i takes place.

John


On May 5, 2015, at 7:01 AM, Martin Maechler <maechler at
lynne.stat.math.ethz.ch> wrote:
>>>>>> Henrik Bengtsson <henrik.bengtsson at ucsf.edu>
>>>>>>   on Mon, 4 May 2015 12:20:44 -0700 writes:
> 
>> In Section 'Indexing by vectors' of 'R Language
Definition'
>>
(http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors)
>> it says:
> 
>> "Integer. All elements of i must have the same sign. If they are
>> positive, the elements of x with those index numbers are selected. If
>> i contains negative elements, all elements except those indicated are
>> selected.
> 
>> If i is positive and exceeds length(x) then the corresponding
>> selection is NA. A negative out of bounds value for i causes an error.
> 
>> A special case is the zero index, which has null effects: x[0] is an
>> empty vector and otherwise including zeros among positive or negative
>> indices has the same effect as if they were omitted."
> 
>> However, that "A negative out of bounds value for i causes an
error"
>> in the second paragraph does not seem to apply.  Instead, R silently
>> ignore negative indices that are out of range.  For example:
> 
>>> x <- 1:4
>>> x[-9L]
>> [1] 1 2 3 4
>>> x[-c(1:9)]
>> integer(0)
>>> x[-c(3:9)]
>> [1] 1 2
> 
>>> y <- as.list(1:4)
>>> y[-c(1:9)]
>> list()
> 
>> Is the observed non-error the correct behavior and therefore the
>> documentation is incorrect, or is it vice verse?  (...or is it me
>> missing something)
> 
>> I get the above on R devel, R 3.2.0, and as far back as R 2.11.0
>> (haven't check earlier versions).
> 
> Thank you, Henrik!
> 
> I've checked further back: The change happened between R 2.5.1 and R
2.6.0.
> 
> The previous behavior was
> 
>> (1:3)[-(3:5)]
> Error: subscript out of bounds
> 
> If you start reading NEWS.2, you see a *lot* of new features
> (and bug fixes) in the 2.6.0 news, but from my browsing, none of
> them mentioned the new behavior as feature.
> 
> Let's -- for a moment -- declare it a bug in the code, i.e., not
> in the documentation:
> 
> - As 2.6.0  happened quite a while ago (Oct. 2007),  
> we could wonder how much R code will break if we fix the bug.
> 
> - Is the R package authors' community willing to do the necessary
> cleanup in their packages ?
> 
> ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
> 
> 
> Now, after reading the source code for a while, and looking at
> the changes, I've found the log entry
> 
> ------------------------------------------------------------------------
> r42123 | ihaka | 2007-07-05 02:00:05 +0200 (Thu, 05 Jul 2007) | 4 lines
> 
> Changed the behaviour of out-of-bounds negative
> subscripts to match that of S.  Such values are
> now ignored rather than tripping an error.
> 
> ------------------------------------------------------------------------
> 
> So, it was changed on purpose, by one of the true "R"s, very
> much on purpose.
> 
> Making it a *warning* instead of the original error
> may have been both more cautious and more helpful for
> detecting programming errors.
> 
> OTOH, John Chambers, the father of S and hence grandfather of R,
> may have had good reasons why it seemed more logical to silently
> ignore such out of bound negative indices:
> One could argue that
> 
>  x[-5]  means  "leave away the 5-th element of x"
> 
> and if there is no 5-th element of x, leaving it away should be a no-op.
> 
> After all this musing and history detection, my gut decision
> would be to only change the documentation which Ross forgot to change.
> 
> But of course, it may be interesting to hear other programmeR's
feedback on this.
> 
> Martin

Martin Maechler

2015-May-06 08:33 UTC

head link

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

>>>>> John Chambers <jmc at stat.stanford.edu>
>>>>>     on Tue, 5 May 2015 08:39:46 -0700 writes:
    > When someone suggests that we "might have had a reason" for
some peculiarity in the original S, my usual reaction is "Or else we never
thought of the problem".
    > In this case, however, there is a relevant statement in the 1988
"blue book".  In the discussion of subscripting (p 358) the definition
for negative i says: "the indices consist of the elements of seq(along=x)
that do not match any elements in -i".

    > Suggesting that no bounds checking on -i takes place.

    > John

Indeed!  
Thanks a lot John, for the perspective and clarification!

I'm committing a patch to the documentation now.
Martin


    > On May 5, 2015, at 7:01 AM, Martin Maechler <maechler at
lynne.stat.math.ethz.ch> wrote:

    >>>>>>> Henrik Bengtsson <henrik.bengtsson at
ucsf.edu>
    >>>>>>> on Mon, 4 May 2015 12:20:44 -0700 writes:
    >> 
    >>> In Section 'Indexing by vectors' of 'R Language
Definition'
    >>>
(http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors)
    >>> it says:
    >> 
    >>> "Integer. All elements of i must have the same sign. If
they are
    >>> positive, the elements of x with those index numbers are
selected. If
    >>> i contains negative elements, all elements except those
indicated are
    >>> selected.
    >> 
    >>> If i is positive and exceeds length(x) then the corresponding
    >>> selection is NA. A negative out of bounds value for i causes an
error.
    >> 
    >>> A special case is the zero index, which has null effects: x[0]
is an
    >>> empty vector and otherwise including zeros among positive or
negative
    >>> indices has the same effect as if they were omitted."
    >> 
    >>> However, that "A negative out of bounds value for i causes
an error"
    >>> in the second paragraph does not seem to apply.  Instead, R
silently
    >>> ignore negative indices that are out of range.  For example:
    >> 
    >>>> x <- 1:4
    >>>> x[-9L]
    >>> [1] 1 2 3 4
    >>>> x[-c(1:9)]
    >>> integer(0)
    >>>> x[-c(3:9)]
    >>> [1] 1 2
    >> 
    >>>> y <- as.list(1:4)
    >>>> y[-c(1:9)]
    >>> list()
    >> 
    >>> Is the observed non-error the correct behavior and therefore
the
    >>> documentation is incorrect, or is it vice verse?  (...or is it
me
    >>> missing something)
    >> 
    >>> I get the above on R devel, R 3.2.0, and as far back as R
2.11.0
    >>> (haven't check earlier versions).
    >> 
    >> Thank you, Henrik!
    >> 
    >> I've checked further back: The change happened between R 2.5.1
and R 2.6.0.
    >> 
    >> The previous behavior was
    >> 
    >>> (1:3)[-(3:5)]
    >> Error: subscript out of bounds
    >> 
    >> If you start reading NEWS.2, you see a *lot* of new features
    >> (and bug fixes) in the 2.6.0 news, but from my browsing, none of
    >> them mentioned the new behavior as feature.
    >> 
    >> Let's -- for a moment -- declare it a bug in the code, i.e.,
not
    >> in the documentation:
    >> 
    >> - As 2.6.0  happened quite a while ago (Oct. 2007),  
    >> we could wonder how much R code will break if we fix the bug.
    >> 
    >> - Is the R package authors' community willing to do the
necessary
    >> cleanup in their packages ?
    >> 
    >> ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 
    >> 
    >> 
    >> Now, after reading the source code for a while, and looking at
    >> the changes, I've found the log entry
    >> 
    >>
------------------------------------------------------------------------
    >> r42123 | ihaka | 2007-07-05 02:00:05 +0200 (Thu, 05 Jul 2007) | 4
lines
    >> 
    >> Changed the behaviour of out-of-bounds negative
    >> subscripts to match that of S.  Such values are
    >> now ignored rather than tripping an error.
    >> 
    >>
------------------------------------------------------------------------
    >> 
    >> So, it was changed on purpose, by one of the true "R"s,
very
    >> much on purpose.
    >> 
    >> Making it a *warning* instead of the original error
    >> may have been both more cautious and more helpful for
    >> detecting programming errors.
    >> 
    >> OTOH, John Chambers, the father of S and hence grandfather of R,
    >> may have had good reasons why it seemed more logical to silently
    >> ignore such out of bound negative indices:
    >> One could argue that
    >> 
    >> x[-5]  means  "leave away the 5-th element of x"
    >> 
    >> and if there is no 5-th element of x, leaving it away should be a
no-op.
    >> 
    >> After all this musing and history detection, my gut decision
    >> would be to only change the documentation which Ross forgot to
change.
    >> 
    >> But of course, it may be interesting to hear other programmeR's
feedback on this.
    >> 
    >> Martin

Henrik Bengtsson

2015-May-06 16:04 UTC

head link

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

On Wed, May 6, 2015 at 1:33 AM, Martin Maechler
<maechler at lynne.stat.math.ethz.ch> wrote:>>>>>> John Chambers <jmc at stat.stanford.edu>
>>>>>>     on Tue, 5 May 2015 08:39:46 -0700 writes:
>
>     > When someone suggests that we "might have had a reason"
for some peculiarity in the original S, my usual reaction is "Or else we
never thought of the problem".
>     > In this case, however, there is a relevant statement in the 1988
"blue book".  In the discussion of subscripting (p 358) the definition
for negative i says: "the indices consist of the elements of seq(along=x)
that do not match any elements in -i".
>
>     > Suggesting that no bounds checking on -i takes place.
>
>     > John
>
> Indeed!
> Thanks a lot John, for the perspective and clarification!
>
> I'm committing a patch to the documentation now.
Thank you both and also credits to Dongcan Jiang for pointing out to
me that errors were indeed not generated in this case.

I agree with the decision. It's interesting to notice that now the
only way an error is generated is when index-vector subsetting is done
using mixed positive and negative indices, e.g. x[c(-1,1)].

/Henrik
> Martin
>
>
>     > On May 5, 2015, at 7:01 AM, Martin Maechler <maechler at
lynne.stat.math.ethz.ch> wrote:
>
>     >>>>>>> Henrik Bengtsson <henrik.bengtsson at
ucsf.edu>
>     >>>>>>> on Mon, 4 May 2015 12:20:44 -0700 writes:
>     >>
>     >>> In Section 'Indexing by vectors' of 'R
Language Definition'
>     >>>
(http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors)
>     >>> it says:
>     >>
>     >>> "Integer. All elements of i must have the same sign.
If they are
>     >>> positive, the elements of x with those index numbers are
selected. If
>     >>> i contains negative elements, all elements except those
indicated are
>     >>> selected.
>     >>
>     >>> If i is positive and exceeds length(x) then the
corresponding
>     >>> selection is NA. A negative out of bounds value for i
causes an error.
>     >>
>     >>> A special case is the zero index, which has null effects:
x[0] is an
>     >>> empty vector and otherwise including zeros among positive
or negative
>     >>> indices has the same effect as if they were omitted."
>     >>
>     >>> However, that "A negative out of bounds value for i
causes an error"
>     >>> in the second paragraph does not seem to apply.  Instead,
R silently
>     >>> ignore negative indices that are out of range.  For
example:
>     >>
>     >>>> x <- 1:4
>     >>>> x[-9L]
>     >>> [1] 1 2 3 4
>     >>>> x[-c(1:9)]
>     >>> integer(0)
>     >>>> x[-c(3:9)]
>     >>> [1] 1 2
>     >>
>     >>>> y <- as.list(1:4)
>     >>>> y[-c(1:9)]
>     >>> list()
>     >>
>     >>> Is the observed non-error the correct behavior and
therefore the
>     >>> documentation is incorrect, or is it vice verse?  (...or
is it me
>     >>> missing something)
>     >>
>     >>> I get the above on R devel, R 3.2.0, and as far back as R
2.11.0
>     >>> (haven't check earlier versions).
>     >>
>     >> Thank you, Henrik!
>     >>
>     >> I've checked further back: The change happened between R
2.5.1 and R 2.6.0.
>     >>
>     >> The previous behavior was
>     >>
>     >>> (1:3)[-(3:5)]
>     >> Error: subscript out of bounds
>     >>
>     >> If you start reading NEWS.2, you see a *lot* of new features
>     >> (and bug fixes) in the 2.6.0 news, but from my browsing, none
of
>     >> them mentioned the new behavior as feature.
>     >>
>     >> Let's -- for a moment -- declare it a bug in the code,
i.e., not
>     >> in the documentation:
>     >>
>     >> - As 2.6.0  happened quite a while ago (Oct. 2007),
>     >> we could wonder how much R code will break if we fix the bug.
>     >>
>     >> - Is the R package authors' community willing to do the
necessary
>     >> cleanup in their packages ?
>     >>
>     >> ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
>     >>
>     >>
>     >> Now, after reading the source code for a while, and looking at
>     >> the changes, I've found the log entry
>     >>
>     >>
------------------------------------------------------------------------
>     >> r42123 | ihaka | 2007-07-05 02:00:05 +0200 (Thu, 05 Jul 2007)
| 4 lines
>     >>
>     >> Changed the behaviour of out-of-bounds negative
>     >> subscripts to match that of S.  Such values are
>     >> now ignored rather than tripping an error.
>     >>
>     >>
------------------------------------------------------------------------
>     >>
>     >> So, it was changed on purpose, by one of the true
"R"s, very
>     >> much on purpose.
>     >>
>     >> Making it a *warning* instead of the original error
>     >> may have been both more cautious and more helpful for
>     >> detecting programming errors.
>     >>
>     >> OTOH, John Chambers, the father of S and hence grandfather of
R,
>     >> may have had good reasons why it seemed more logical to
silently
>     >> ignore such out of bound negative indices:
>     >> One could argue that
>     >>
>     >> x[-5]  means  "leave away the 5-th element of x"
>     >>
>     >> and if there is no 5-th element of x, leaving it away should
be a no-op.
>     >>
>     >> After all this musing and history detection, my gut decision
>     >> would be to only change the documentation which Ross forgot to
change.
>     >>
>     >> But of course, it may be interesting to hear other
programmeR's feedback on this.
>     >>
>     >> Martin
>

Apparently Analagous Threads

Search for more apparently analagous threads

R devel - May 2015 - Shouldn't vector indexing with negative out-of-range index give an error?

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

[Rd] Shouldn't vector indexing with negative out-of-range index give an error?

Apparently Analagous Threads