thr3ads.net - R devel - [Rd] NA_real_ <op> NaN -> NA or NaN, should we care? [Apr 2009]

If this information is useful, please help other people find it:
Share via:

William Dunlap

2009-Apr-30 17:51 UTC

[Rd] NA_real_ <op> NaN -> NA or NaN, should we care?

On Linux when I compile R 2.10.0(devel) (src/main/arithmetic.c in
particular)
with gcc 3.4.5 using the flags -g -O2 I get noncommutative behavior when
adding NA and NaN:
   > NA_real_ + NaN
   [1] NaN
   > NaN + NA_real_
   [1] NA
If I compile src/main/arithmetic.c without optimization (just -g)
then both of those return NA.

On Windows, using a precompiled R 2.8.1 from CRAN I get
NA for both answers.

On Linux, after compiling src/main/arithmetic.c with -g -O2 the bit
patterns for NA_real_ and as.numeric(NA) are different:
   > my_numeric_NA <- as.numeric(NA)
   > writeBin(my_numeric_NA, ptmp<-pipe("od -x",
open="wb"));close(ptmp)
   0000000 07a2 0000 0000 7ff8
   0000010
   > writeBin(NA_real_, ptmp<-pipe("od -x",
open="wb"));close(ptmp)
   0000000 07a2 0000 0000 7ff0
   0000010 
On Linux, after compiling with -g the bit patterns for NA_real_
and as.numeric(NA) are identical.
   > my_numeric_NA <- as.numeric(NA)
   > writeBin(my_numeric_NA, ptmp<-pipe("od -x",
open="wb"));close(ptmp)
   0000000 07a2 0000 0000 7ff8
   0000010
   > writeBin(NA_real_, ptmp<-pipe("od -x",
open="wb"));close(ptmp)
   0000000 07a2 0000 0000 7ff8
   0000010

On Windows, using precompiled R 2.8.1 and cygwin/bin/od, both of those
gave the 7ff8 version.

Is this confounding of NA and NaN of concern or does R not promise to
keep NA and NaN distinct? 

I haven't followed all the macros, but it looks like arithmetic.c just
does
    result[i]=x[i]+y[i]
and lets the compiler/floating point unit decide what to do when x[i]
and y[i]
are different NaN values (NA is a NaN value).  I haven't looked at the C
code
for the initialization of NA_real_.  Adding explicit tests for NA-ness
in the
binary operators (as S+ does) adds a fairly significant cost.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

Martin Maechler

2009-May-01 12:14 UTC

head link

[Rd] NA_real_ <op> NaN -> NA or NaN, should we care?

>>>>> William Dunlap <wdunlap at tibco.com>
>>>>>     on Thu, 30 Apr 2009 10:51:43 -0700 writes:
    > On Linux when I compile R 2.10.0(devel) (src/main/arithmetic.c in
    > particular)
    > with gcc 3.4.5 using the flags -g -O2 I get noncommutative behavior
when

is this really gcc 3.4.5  (which is quite old) ?

Without being an expert, I'd tend to claim this to be a
compiler (optimization) bug ....  but most probably the ANSI /
ISO  C (and libc ?) standards would not define the exact
behavior of arithmetic with NaNs.

    > adding NA and NaN:
    >> NA_real_ + NaN
    > [1] NaN
    >> NaN + NA_real_
    > [1] NA
    > If I compile src/main/arithmetic.c without optimization (just -g)
    > then both of those return NA.

    > On Windows, using a precompiled R 2.8.1 from CRAN I get
    > NA for both answers.

    > On Linux, after compiling src/main/arithmetic.c with -g -O2 the bit
    > patterns for NA_real_ and as.numeric(NA) are different:
    >> my_numeric_NA <- as.numeric(NA)
    >> writeBin(my_numeric_NA, ptmp<-pipe("od -x",
open="wb"));close(ptmp)
    > 0000000 07a2 0000 0000 7ff8
    > 0000010
    >> writeBin(NA_real_, ptmp<-pipe("od -x",
open="wb"));close(ptmp)
    > 0000000 07a2 0000 0000 7ff0
    > 0000010 
    > On Linux, after compiling with -g the bit patterns for NA_real_
    > and as.numeric(NA) are identical.
    >> my_numeric_NA <- as.numeric(NA)
    >> writeBin(my_numeric_NA, ptmp<-pipe("od -x",
open="wb"));close(ptmp)
    > 0000000 07a2 0000 0000 7ff8
    > 0000010
    >> writeBin(NA_real_, ptmp<-pipe("od -x",
open="wb"));close(ptmp)
    > 0000000 07a2 0000 0000 7ff8
    > 0000010

    > On Windows, using precompiled R 2.8.1 and cygwin/bin/od, both of those
    > gave the 7ff8 version.

    > Is this confounding of NA and NaN of concern or does R not promise to
    > keep NA and NaN distinct? 

Hmm, I'd say it *is* of some concern that "+" is not commutative
in the narrow sense, even if I don't know what exactly "R
promises".

    > I haven't followed all the macros, but it looks like arithmetic.c
just
    > does
    > result[i]=x[i]+y[i]
    > and lets the compiler/floating point unit decide what to do when x[i]
    > and y[i]
    > are different NaN values (NA is a NaN value).  I haven't looked at
the C
    > code
    > for the initialization of NA_real_.  Adding explicit tests for NA-ness
    > in the
    > binary operators (as S+ does) adds a fairly significant cost.

Yes, I would be quite reluctant to add such
tests, because such costs are to be expected.

Maybe we ("R" :-) should explicitly state that operations mixing
NA & NaN give a result which is NA in the sense of fulfilling is.na(.) 
but *not* promise anything further.

Martin Maechler, ETH Zurich

    > Bill Dunlap
    > TIBCO Software Inc - Spotfire Division
    > wdunlap tibco.com

William Dunlap

2009-May-01 14:50 UTC

head link

[Rd] NA_real_ <op> NaN -> NA or NaN, should we care?

> From: Martin Maechler [mailto:maechler at stat.math.ethz.ch] 
> Sent: Friday, May 01, 2009 5:15 AM
> To: William Dunlap
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] NA_real_ <op> NaN -> NA or NaN, should we care?
> 
> >>>>> William Dunlap <wdunlap at tibco.com>
> >>>>>     on Thu, 30 Apr 2009 10:51:43 -0700 writes:
> 
>     > On Linux when I compile R 2.10.0(devel) 
> (src/main/arithmetic.c in
>     > particular)
>     > with gcc 3.4.5 using the flags -g -O2 I get 
> noncommutative behavior when
> 
> is this really gcc 3.4.5  (which is quite old) ?
Yes, it was 3.4.5, but here is a self-contained example of the same
issue using gcc 4.1.3 on an Ubuntu Linux machine:

% gcc -O2 t.c -o a.out ; ./a.out
NA : 7ff00000000007a2
NaN: fff8000000000000
NA+NaN: 7ff80000000007a2
NaN+NA: fff8000000000000
% gcc  t.c -o a.out ; ./a.out
NA : 7ff00000000007a2
NaN: fff8000000000000
NA+NaN: 7ff80000000007a2
NaN+NA: 7ff80000000007a2
% gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.1.3 --program-suffix=-4.1
--enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug
--enable-mpfr --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)
% cat t.c
#include <stdio.h>
#include <stdint.h>
#include <string.h>

int main(int argc, char *argv[])
{
    int64_t NA_int64 = 0x7ff00000000007a2LL ;
    int64_t NaN_int64 = 0xfff8000000000000LL ;
    int64_t sum_int64 ;
    double NA_double, NaN_double, sum_double ;

    memcpy((void*)&NA_double, (void*)&NA_int64, 8) ;
    memcpy((void*)&NaN_double, (void*)&NaN_int64, 8) ;

    NaN_double = 1/0.0 - 1/0.0 ;

    printf("NA : %Lx\n", *(int64_t*)&NA_double);
    printf("NaN: %Lx\n", *(int64_t*)&NaN_double);
    sum_double = NA_double + NaN_double ;
    memcpy((void*)&sum_int64, (void*)&sum_double, 8) ;
    printf("NA+NaN: %Lx\n", sum_int64) ;
    sum_double = NaN_double + NA_double ;
    memcpy((void*)&sum_int64, (void*)&sum_double, 8) ;
    printf("NaN+NA: %Lx\n", sum_int64);
    return 0 ;
}

When I add -Wall to the -O2 then it gives me some warnings about the
*(int64_t)&doubleVal in the printf statements for the inputs, but I used
memcpy() to avoid the warnings when printing the outputs.

% gcc -Wall -O2 t.c -o a.out ; ./a.out
t.c: In function ?main?:
t.c:17: warning: dereferencing type-punned pointer will break strict-aliasing
rules
t.c:18: warning: dereferencing type-punned pointer will break strict-aliasing
rules
NA : 7ff00000000007a2
NaN: fff8000000000000
NA+NaN: 7ff80000000007a2
NaN+NA: fff8000000000000

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

Apparently Analagous Threads

Search for more maybe matching threads

R devel - Apr 2009 - NA_real_ <op> NaN -> NA or NaN, should we care?

[Rd] NA_real_ <op> NaN -> NA or NaN, should we care?

[Rd] NA_real_ <op> NaN -> NA or NaN, should we care?

[Rd] NA_real_ <op> NaN -> NA or NaN, should we care?

Apparently Analagous Threads