thr3ads.net - R devel - [Rd] Undefined behavior of head() and tail() with n = 0 [Jan 2017]

If this information is useful, please help other people find it:
Share via:

William Dunlap

2017-Jan-26 15:51 UTC

[Rd] Undefined behavior of head() and tail() with n = 0

In addition, signed zeroes only exist for floating point numbers - the
bit patterns for as.integer(0) and as.integer(-0) are identical.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:>>>>>> Florent Angly <florent.angly at gmail.com>
>>>>>>     on Wed, 25 Jan 2017 16:31:45 +0100 writes:
>
>     > Hi all,
>     > The documentation for head() and tail() describes the behavior of
>     > these generic functions when n is strictly positive (n > 0) and
>     > strictly negative (n < 0). How these functions work when given
a zero
>     > value is not defined.
>
>     > Both GNU command-line utilities head and tail behave differently
with +0 and -0:
>     > http://man7.org/linux/man-pages/man1/head.1.html
>     > http://man7.org/linux/man-pages/man1/tail.1.html
>
>     > Since R supports signed zeros (1/+0 != 1/-0)
>
> whoa, whoa, .. slow down --  The above is misleading!
>
> Rather read in  ?Arithmetic (*the* reference to consult for such issues),
> where the 2nd part of the following section
>
>  || Implementation limits:
>  ||
>  ||      [..............]
>  ||
>  ||      Another potential issue is signed zeroes: on IEC 60659 platforms
>  ||      there are two zeroes with internal representations differing by
>  ||      sign.  Where possible R treats them as the same, but for example
>  ||      direct output from C code often does not do so and may output
>  ||      ?-0.0? (and on Windows whether it does so or not depends on the
>  ||      version of Windows).  One place in R where the difference might be
>  ||      seen is in division by zero: ?1/x? is ?Inf? or ?-Inf? depending on
>  ||      the sign of zero ?x?.  Another place is ?identical(0, -0, num.eq
>  ||      FALSE)?.
>
> says the *contrary* ( __Where possible R treats them as the same__ ):
> We do _not_ want to distinguish -0 and +0,
> but there are cases where it is inavoidable
>
> And there are good reasons (mathematics !!) for this.
>
> I'm pretty sure that it would be quite a mistake to start
> differentiating it here...  but of course we can continue
> discussing here if you like.
>
> Martin Maechler
> ETH Zurich and R Core
>
>
>     > and the R head() and tail() functions are modeled after
>     > their GNU counterparts, I would expect the R functions to
>     > distinguish between +0 and -0
>
>     >> tail(1:5, n=0)
>     > integer(0)
>     >> tail(1:5, n=1)
>     > [1] 5
>     >> tail(1:5, n=2)
>     > [1] 4 5
>
>     >> tail(1:5, n=-2)
>     > [1] 3 4 5
>     >> tail(1:5, n=-1)
>     > [1] 2 3 4 5
>     >> tail(1:5, n=-0)
>     > integer(0)  # expected 1:5
>
>     >> head(1:5, n=0)
>     > integer(0)
>     >> head(1:5, n=1)
>     > [1] 1
>     >> head(1:5, n=2)
>     > [1] 1 2
>
>     >> head(1:5, n=-2)
>     > [1] 1 2 3
>     >> head(1:5, n=-1)
>     > [1] 1 2 3 4
>     >> head(1:5, n=-0)
>     > integer(0)  # expected 1:5
>
>     > For both head() and tail(), I expected 1:5 as output but got
>     > integer(0). I obtained similar results using a data.frame and a
>     > function as x argument.
>
>     > An easy fix would be to explicitly state in the documentation what
n >     > 0 does, and that there is no practical difference between -0 and
+0.
>     > However, in my eyes, the better approach would be implement
support
>     > for -0 and document it. What do you think?
>
>     > Best,
>
>     > Florent
>
>
>     > PS/ My sessionInfo() gives:
>     > R version 3.3.2 (2016-10-31)
>     > Platform: x86_64-w64-mingw32/x64 (64-bit)
>     > Running under: Windows 7 x64 (build 7601) Service Pack 1
>
>     > locale:
>     > [1] LC_COLLATE=German_Switzerland.1252
>     > LC_CTYPE=German_Switzerland.1252
>     > LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>     > LC_TIME=German_Switzerland.1252
>
>     > attached base packages:
>     > [1] stats     graphics  grDevices utils     datasets  methods  
base
>
>     > ______________________________________________
>     > R-devel at r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Florent Angly

2017-Jan-27 09:24 UTC

head link

[Rd] Undefined behavior of head() and tail() with n = 0

Martin, I agree with you that +0 and -0 should generally be treated as
equal, and R does a fine job in this respect. The Wikipedia article on
signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this
view but also highlights that +0 and -0 can be treated differently in
particular situations, including their interpretation as mathematical
limits (as in the 1/-0 case). Indeed, the main question here is
whether head() and tail() represent a special case that would benefit
from differentiating between +0 and -0.

We can break down the discussion into two problems:
A/ the discrepancy between the implementation of R head() and tail()
and the documentation of these functions (where the use of zero is not
documented and thus not permissible),
B/ the discrepancy between the implementation of R head() and tail()
and their GNU equivalent (which allow zeros and differentiate between
-0 and +0, i.e. head takes "0" and "-0", tail takes
"0" and "+0").

There are several possible solutions to address these discrepancies:

1/ Leave the code as-is but document its behavior with respect to zero
(zeros allowed, with negative zeros treated like positive zeros).
Advantages: This is the path of least resistance, and discrepancy A is fixed.
Disadvantages: Discrepancy B remains (but is documented).

2/ Leave the documentation as-is but reflect this in code by not
allowing zeros at all.
Advantages: Discrepancy A is fixed.
Disadvantages: Discrepancy B remains in some form (but is documented).
Need to deprecate the usage of +0 (which was not clearly documented
but may have been assumed by users).

3/ Update the code and documentation to differentiate between +0 and -0.
Advantages: In my eyes, this is the ideal solution since discrepancy A
and (most of) B are resolved.
Disadvantages: It is unclear how to implement this solution and the
implications it may have on backward compatibility:
   a/ Allow -0 (as double). But is it supported on all platforms used
by R (see ?Arithmetic)? William has raised the issue that negative
zero cannot be represented as an integer. Should head() and tail()
then strictly check double input (while forbidding integers)?
   b/ The input could always be as character. This would allow to
mirror even more closely GNU tail (where the prefix "+" is used to
invert the meaning of n). This probably involves a fair amount of work
and careful handling of deprecation.



On 26 January 2017 at 16:51, William Dunlap <wdunlap at tibco.com>
wrote:> In addition, signed zeroes only exist for floating point numbers - the
> bit patterns for as.integer(0) and as.integer(-0) are identical.
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
> <maechler at stat.math.ethz.ch> wrote:
>>>>>>> Florent Angly <florent.angly at gmail.com>
>>>>>>>     on Wed, 25 Jan 2017 16:31:45 +0100 writes:
>>
>>     > Hi all,
>>     > The documentation for head() and tail() describes the behavior
of
>>     > these generic functions when n is strictly positive (n > 0)
and
>>     > strictly negative (n < 0). How these functions work when
given a zero
>>     > value is not defined.
>>
>>     > Both GNU command-line utilities head and tail behave
differently with +0 and -0:
>>     > http://man7.org/linux/man-pages/man1/head.1.html
>>     > http://man7.org/linux/man-pages/man1/tail.1.html
>>
>>     > Since R supports signed zeros (1/+0 != 1/-0)
>>
>> whoa, whoa, .. slow down --  The above is misleading!
>>
>> Rather read in  ?Arithmetic (*the* reference to consult for such
issues),
>> where the 2nd part of the following section
>>
>>  || Implementation limits:
>>  ||
>>  ||      [..............]
>>  ||
>>  ||      Another potential issue is signed zeroes: on IEC 60659
platforms
>>  ||      there are two zeroes with internal representations differing
by
>>  ||      sign.  Where possible R treats them as the same, but for
example
>>  ||      direct output from C code often does not do so and may output
>>  ||      ?-0.0? (and on Windows whether it does so or not depends on
the
>>  ||      version of Windows).  One place in R where the difference
might be
>>  ||      seen is in division by zero: ?1/x? is ?Inf? or ?-Inf?
depending on
>>  ||      the sign of zero ?x?.  Another place is ?identical(0, -0,
num.eq >>  ||      FALSE)?.
>>
>> says the *contrary* ( __Where possible R treats them as the same__ ):
>> We do _not_ want to distinguish -0 and +0,
>> but there are cases where it is inavoidable
>>
>> And there are good reasons (mathematics !!) for this.
>>
>> I'm pretty sure that it would be quite a mistake to start
>> differentiating it here...  but of course we can continue
>> discussing here if you like.
>>
>> Martin Maechler
>> ETH Zurich and R Core
>>
>>
>>     > and the R head() and tail() functions are modeled after
>>     > their GNU counterparts, I would expect the R functions to
>>     > distinguish between +0 and -0
>>
>>     >> tail(1:5, n=0)
>>     > integer(0)
>>     >> tail(1:5, n=1)
>>     > [1] 5
>>     >> tail(1:5, n=2)
>>     > [1] 4 5
>>
>>     >> tail(1:5, n=-2)
>>     > [1] 3 4 5
>>     >> tail(1:5, n=-1)
>>     > [1] 2 3 4 5
>>     >> tail(1:5, n=-0)
>>     > integer(0)  # expected 1:5
>>
>>     >> head(1:5, n=0)
>>     > integer(0)
>>     >> head(1:5, n=1)
>>     > [1] 1
>>     >> head(1:5, n=2)
>>     > [1] 1 2
>>
>>     >> head(1:5, n=-2)
>>     > [1] 1 2 3
>>     >> head(1:5, n=-1)
>>     > [1] 1 2 3 4
>>     >> head(1:5, n=-0)
>>     > integer(0)  # expected 1:5
>>
>>     > For both head() and tail(), I expected 1:5 as output but got
>>     > integer(0). I obtained similar results using a data.frame and
a
>>     > function as x argument.
>>
>>     > An easy fix would be to explicitly state in the documentation
what n >>     > 0 does, and that there is no practical difference
between -0 and +0.
>>     > However, in my eyes, the better approach would be implement
support
>>     > for -0 and document it. What do you think?
>>
>>     > Best,
>>
>>     > Florent
>>
>>
>>     > PS/ My sessionInfo() gives:
>>     > R version 3.3.2 (2016-10-31)
>>     > Platform: x86_64-w64-mingw32/x64 (64-bit)
>>     > Running under: Windows 7 x64 (build 7601) Service Pack 1
>>
>>     > locale:
>>     > [1] LC_COLLATE=German_Switzerland.1252
>>     > LC_CTYPE=German_Switzerland.1252
>>     > LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>     > LC_TIME=German_Switzerland.1252
>>
>>     > attached base packages:
>>     > [1] stats     graphics  grDevices utils     datasets  methods 
base
>>
>>     > ______________________________________________
>>     > R-devel at r-project.org mailing list
>>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

Martin Maechler

2017-Jan-27 13:55 UTC

head link

[Rd] Undefined behavior of head() and tail() with n = 0

Dear Florent,

thank you for striving to clearly disentangle and present the
issue below.
That is a nice "role model" way of approaching such topics!
>>>>> Florent Angly <florent.angly at gmail.com>
>>>>>     on Fri, 27 Jan 2017 10:24:39 +0100 writes:
    > Martin, I agree with you that +0 and -0 should generally be treated as
    > equal, and R does a fine job in this respect. The Wikipedia article on
    > signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this
    > view but also highlights that +0 and -0 can be treated differently in
    > particular situations, including their interpretation as mathematical
    > limits (as in the 1/-0 case). Indeed, the main question here is
    > whether head() and tail() represent a special case that would benefit
    > from differentiating between +0 and -0.

    > We can break down the discussion into two problems:
    > A/ the discrepancy between the implementation of R head() and tail()
    > and the documentation of these functions (where the use of zero is not
    > documented and thus not permissible),

Ehm, no, in R (and many other software systems),

  "not documented" does *NOT* entail "not permissible"


    > B/ the discrepancy between the implementation of R head() and tail()
    > and their GNU equivalent (which allow zeros and differentiate between
    > -0 and +0, i.e. head takes "0" and "-0", tail takes
"0" and "+0").

This discrepancy, as you mention later comes from the fact that
basically, these arguments are strings in the Unix tools (GNU being a
special case of Unix, here) and integers in R.

Below, I'm giving my personal view of the issue:

    > There are several possible solutions to address these discrepancies:

    > 1/ Leave the code as-is but document its behavior with respect to zero
    > (zeros allowed, with negative zeros treated like positive zeros).
    > Advantages: This is the path of least resistance, and discrepancy A is
fixed.
    > Disadvantages: Discrepancy B remains (but is documented).

That would be my "clear" choice.


    > 2/ Leave the documentation as-is but reflect this in code by not
    > allowing zeros at all.
    > Advantages: Discrepancy A is fixed.
    > Disadvantages: Discrepancy B remains in some form (but is documented).
    > Need to deprecate the usage of +0 (which was not clearly documented
    > but may have been assumed by users).

2/ looks "uniformly inferior" to 1/ to me


    > 3/ Update the code and documentation to differentiate between +0 and
-0.
    > Advantages: In my eyes, this is the ideal solution since discrepancy A
    > and (most of) B are resolved.
    > Disadvantages: It is unclear how to implement this solution and the
    > implications it may have on backward compatibility:
    > a/ Allow -0 (as double). But is it supported on all platforms used
    > by R (see ?Arithmetic)? William has raised the issue that negative
    > zero cannot be represented as an integer. Should head() and tail()
    > then strictly check double input (while forbidding integers)?
    > b/ The input could always be as character. This would allow to
    > mirror even more closely GNU tail (where the prefix "+" is
used to
    > invert the meaning of n). This probably involves a fair amount of work
    > and careful handling of deprecation.

3/ involves quite a few complications, and in my view, your
   advantages are not even getting close to counter-weigh the drawbacks.


    > On 26 January 2017 at 16:51, William Dunlap <wdunlap at
tibco.com> wrote:
    >> In addition, signed zeroes only exist for floating point numbers -
the
    >> bit patterns for as.integer(0) and as.integer(-0) are identical.

indeed!

    >> Bill Dunlap
    >> TIBCO Software
    >> wdunlap tibco.com
    >> 
    >> 
    >> On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
    >> <maechler at stat.math.ethz.ch> wrote:
    >>>>>>>> Florent Angly <florent.angly at
gmail.com>
    >>>>>>>> on Wed, 25 Jan 2017 16:31:45 +0100 writes:
    >>> 
    >>> > Hi all,
    >>> > The documentation for head() and tail() describes the
behavior of
    >>> > these generic functions when n is strictly positive (n
> 0) and
    >>> > strictly negative (n < 0). How these functions work
when given a zero
    >>> > value is not defined.
    >>> 
    >>> > Both GNU command-line utilities head and tail behave
differently with +0 and -0:
    >>> > http://man7.org/linux/man-pages/man1/head.1.html
    >>> > http://man7.org/linux/man-pages/man1/tail.1.html
    >>> 
    >>> > Since R supports signed zeros (1/+0 != 1/-0)
    >>> 
    >>> whoa, whoa, .. slow down --  The above is misleading!
    >>> 
    >>> Rather read in  ?Arithmetic (*the* reference to consult for
such issues),
    >>> where the 2nd part of the following section
    >>> 
    >>> || Implementation limits:
    >>> ||
    >>> ||      [..............]
    >>> ||
    >>> ||      Another potential issue is signed zeroes: on IEC 60659
platforms
    >>> ||      there are two zeroes with internal representations
differing by
    >>> ||      sign.  Where possible R treats them as the same, but
for example
    >>> ||      direct output from C code often does not do so and may
output
    >>> ||      ?-0.0? (and on Windows whether it does so or not
depends on the
    >>> ||      version of Windows).  One place in R where the
difference might be
    >>> ||      seen is in division by zero: ?1/x? is ?Inf? or ?-Inf?
depending on
    >>> ||      the sign of zero ?x?.  Another place is ?identical(0,
-0, num.eq     >>> ||      FALSE)?.
    >>> 
    >>> says the *contrary* ( __Where possible R treats them as the
same__ ):
    >>> We do _not_ want to distinguish -0 and +0,
    >>> but there are cases where it is inavoidable
    >>> 
    >>> And there are good reasons (mathematics !!) for this.
    >>> 
    >>> I'm pretty sure that it would be quite a mistake to start
    >>> differentiating it here...  but of course we can continue
    >>> discussing here if you like.
    >>> 
    >>> Martin Maechler
    >>> ETH Zurich and R Core
    >>> 
    >>> 
    >>> > and the R head() and tail() functions are modeled after
    >>> > their GNU counterparts, I would expect the R functions to
    >>> > distinguish between +0 and -0
    >>> 
    >>> >> tail(1:5, n=0)
    >>> > integer(0)
    >>> >> tail(1:5, n=1)
    >>> > [1] 5
    >>> >> tail(1:5, n=2)
    >>> > [1] 4 5
    >>> 
    >>> >> tail(1:5, n=-2)
    >>> > [1] 3 4 5
    >>> >> tail(1:5, n=-1)
    >>> > [1] 2 3 4 5
    >>> >> tail(1:5, n=-0)
    >>> > integer(0)  # expected 1:5
    >>> 
    >>> >> head(1:5, n=0)
    >>> > integer(0)
    >>> >> head(1:5, n=1)
    >>> > [1] 1
    >>> >> head(1:5, n=2)
    >>> > [1] 1 2
    >>> 
    >>> >> head(1:5, n=-2)
    >>> > [1] 1 2 3
    >>> >> head(1:5, n=-1)
    >>> > [1] 1 2 3 4
    >>> >> head(1:5, n=-0)
    >>> > integer(0)  # expected 1:5
    >>> 
    >>> > For both head() and tail(), I expected 1:5 as output but
got
    >>> > integer(0). I obtained similar results using a data.frame
and a
    >>> > function as x argument.
    >>> 
    >>> > An easy fix would be to explicitly state in the
documentation what n     >>> > 0 does, and that there is no
practical difference between -0 and +0.
    >>> > However, in my eyes, the better approach would be
implement support
    >>> > for -0 and document it. What do you think?
    >>> 
    >>> > Best,
    >>> 
    >>> > Florent
    >>> 
    >>> 
    >>> > PS/ My sessionInfo() gives:
    >>> > R version 3.3.2 (2016-10-31)
    >>> > Platform: x86_64-w64-mingw32/x64 (64-bit)
    >>> > Running under: Windows 7 x64 (build 7601) Service Pack 1
    >>> 
    >>> > locale:
    >>> > [1] LC_COLLATE=German_Switzerland.1252
    >>> > LC_CTYPE=German_Switzerland.1252
    >>> > LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
    >>> > LC_TIME=German_Switzerland.1252
    >>> 
    >>> > attached base packages:
    >>> > [1] stats     graphics  grDevices utils     datasets 
methods   base
    >>> 
    >>> > ______________________________________________
    >>> > R-devel at r-project.org mailing list
    >>> > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>> 
    >>> ______________________________________________
    >>> R-devel at r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel

Maybe Matching Threads

Search for more reasonably related threads

R devel - Jan 2017 - Undefined behavior of head() and tail() with n = 0

[Rd] Undefined behavior of head() and tail() with n = 0

[Rd] Undefined behavior of head() and tail() with n = 0

[Rd] Undefined behavior of head() and tail() with n = 0

Maybe Matching Threads