thr3ads.net - R devel - [Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Henrik Bengtsson

2017-Apr-01 03:50 UTC

[Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

In R 3.3.3, I observe the following on Ubuntu 16.04 (when building
from source as well as for the sudo apt r-base build):
> x <- c(NA, NaN)
> mean(x)
[1] NA> mean(rev(x))[1] NaN
> rowMeans(matrix(x, nrow = 1, ncol = 2))
[1] NA> rowMeans(matrix(rev(x), nrow = 1, ncol = 2))[1] NaN
> .rowMeans(x, m = 1, n = 2)
[1] NA> .rowMeans(rev(x), m = 1, n = 2)[1] NaN
> .rowSums(x, m = 1, n = 2)
[1] NA> .rowSums(rev(x), m = 1, n = 2)[1] NaN
> rowSums(matrix(x, nrow = 1, ncol = 2))
[1] NA> rowSums(matrix(rev(x), nrow = 1, ncol = 2))[1] NaN

I'd expect NA to trump NaN in all cases (with na.rm = FALSE).  sum()
does not have this problem and returns NA in both cases (*).

For the same R version build from source on RHEL 6.6 system
(completely different architecture), I get the expected result (= NA)
for all of the above cases, e.g.
> x <- c(NA, NaN)
> mean(x)
[1] NA> mean(rev(x))[1] NA
[...]

Before going insane trying to troubleshoot this, I have a vague memory
that this, or something related to this, has been discussed
previously, but I cannot locate it.

Is the above a bug in R, a FAQ, a build error, overzealous compiler
optimization, and / or ...?

Thanks,

Henrik

Prof Brian Ripley

2017-Apr-01 05:14 UTC

head link

[Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

From ?NA

      Numerical computations using ?NA? will normally result in ?NA?: a
      possible exception is where ?NaN? is also involved, in which case
      either might result.

and ?NaN

      Computations involving ?NaN? will return ?NaN? or perhaps ?NA?:
      which of those two is not guaranteed and may depend on the R
      platform (since compilers may re-order computations).

fortunes::fortune(14) applies (yet again).

On 01/04/2017 04:50, Henrik Bengtsson wrote:> In R 3.3.3, I observe the following on Ubuntu 16.04 (when building
> from source as well as for the sudo apt r-base build):
>
>> x <- c(NA, NaN)
>> mean(x)
> [1] NA
>> mean(rev(x))
> [1] NaN
>
>> rowMeans(matrix(x, nrow = 1, ncol = 2))
> [1] NA
>> rowMeans(matrix(rev(x), nrow = 1, ncol = 2))
> [1] NaN
>
>> .rowMeans(x, m = 1, n = 2)
> [1] NA
>> .rowMeans(rev(x), m = 1, n = 2)
> [1] NaN
>
>> .rowSums(x, m = 1, n = 2)
> [1] NA
>> .rowSums(rev(x), m = 1, n = 2)
> [1] NaN
>
>> rowSums(matrix(x, nrow = 1, ncol = 2))
> [1] NA
>> rowSums(matrix(rev(x), nrow = 1, ncol = 2))
> [1] NaN
>
> I'd expect NA to trump NaN in all cases (with na.rm = FALSE).  sum()
> does not have this problem and returns NA in both cases (*).
>
> For the same R version build from source on RHEL 6.6 system
> (completely different architecture), I get the expected result (= NA)
> for all of the above cases, e.g.
>
>> x <- c(NA, NaN)
>> mean(x)
> [1] NA
>> mean(rev(x))
> [1] NA
> [...]
>
> Before going insane trying to troubleshoot this, I have a vague memory
> that this, or something related to this, has been discussed
> previously, but I cannot locate it.
>
> Is the above a bug in R, a FAQ, a build error, overzealous compiler
> optimization, and / or ...?
>
> Thanks,
>
> Henrik

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford

Henrik Bengtsson

2017-Apr-01 05:51 UTC

head link

[Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

On Fri, Mar 31, 2017 at 10:14 PM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:> From ?NA
>
>      Numerical computations using ?NA? will normally result in ?NA?: a
>      possible exception is where ?NaN? is also involved, in which case
>      either might result.
>
> and ?NaN
>
>      Computations involving ?NaN? will return ?NaN? or perhaps ?NA?:
>      which of those two is not guaranteed and may depend on the R
>      platform (since compilers may re-order computations).
>
> fortunes::fortune(14) applies (yet again).
Thanks; I'm often happy to have contributed to some of the fortune
counters, but not so sure about this one.   What's even worse is that
one of my own matrixStats NEWS has an entry go a few years back which
mentions "... incorrectly assumed that the value of prod(c(NaN, NA))
is uniquely defined.  However, as documented in help("is.nan"), it may
be NA or NaN depending on R system/platform."  I guess the joke is on
me - it's April 1st after all.

But, technically one could test for ISNA(x) for each element before
calculating the intermediate sum, but since that is a quite expensive
test it is not done and sum += x is performed "as is" on NA and NaN
(and -Inf and +Inf).  Is that correct?

/Henrik
>
>
> On 01/04/2017 04:50, Henrik Bengtsson wrote:
>>
>> In R 3.3.3, I observe the following on Ubuntu 16.04 (when building
>> from source as well as for the sudo apt r-base build):
>>
>>> x <- c(NA, NaN)
>>> mean(x)
>>
>> [1] NA
>>>
>>> mean(rev(x))
>>
>> [1] NaN
>>
>>> rowMeans(matrix(x, nrow = 1, ncol = 2))
>>
>> [1] NA
>>>
>>> rowMeans(matrix(rev(x), nrow = 1, ncol = 2))
>>
>> [1] NaN
>>
>>> .rowMeans(x, m = 1, n = 2)
>>
>> [1] NA
>>>
>>> .rowMeans(rev(x), m = 1, n = 2)
>>
>> [1] NaN
>>
>>> .rowSums(x, m = 1, n = 2)
>>
>> [1] NA
>>>
>>> .rowSums(rev(x), m = 1, n = 2)
>>
>> [1] NaN
>>
>>> rowSums(matrix(x, nrow = 1, ncol = 2))
>>
>> [1] NA
>>>
>>> rowSums(matrix(rev(x), nrow = 1, ncol = 2))
>>
>> [1] NaN
>>
>> I'd expect NA to trump NaN in all cases (with na.rm = FALSE). 
sum()
>> does not have this problem and returns NA in both cases (*).
>>
>> For the same R version build from source on RHEL 6.6 system
>> (completely different architecture), I get the expected result (= NA)
>> for all of the above cases, e.g.
>>
>>> x <- c(NA, NaN)
>>> mean(x)
>>
>> [1] NA
>>>
>>> mean(rev(x))
>>
>> [1] NA
>> [...]
>>
>> Before going insane trying to troubleshoot this, I have a vague memory
>> that this, or something related to this, has been discussed
>> previously, but I cannot locate it.
>>
>> Is the above a bug in R, a FAQ, a build error, overzealous compiler
>> optimization, and / or ...?
>>
>> Thanks,
>>
>> Henrik
>
>
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
>
> ______________________________________________
> R-devel at r-project.org mailing list
> stat.ethz.ch/mailman/listinfo/r-devel

Hervé Pagès

2017-Apr-01 23:59 UTC

head link

[Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

On 03/31/2017 10:14 PM, Prof Brian Ripley wrote:> From ?NA
>
>      Numerical computations using ?NA? will normally result in ?NA?: a
>      possible exception is where ?NaN? is also involved, in which case
>      either might result.
>
> and ?NaN
>
>      Computations involving ?NaN? will return ?NaN? or perhaps ?NA?:
>      which of those two is not guaranteed and may depend on the R
>      platform (since compilers may re-order computations).
>
> fortunes::fortune(14) applies (yet again).
The problem is that TFM often contradicts itself e.g. in ?prod:

      If ?na.rm? is ?FALSE? an ?NA? value in any of the arguments will
      cause a value of ?NA? to be returned, otherwise ?NA? values are
      ignored.

which is clearly not the case (at least for me):

   > x <- c(NaN, NA)
   > prod(x)
   [1] NaN

H.
>
> On 01/04/2017 04:50, Henrik Bengtsson wrote:
>> In R 3.3.3, I observe the following on Ubuntu 16.04 (when building
>> from source as well as for the sudo apt r-base build):
>>
>>> x <- c(NA, NaN)
>>> mean(x)
>> [1] NA
>>> mean(rev(x))
>> [1] NaN
>>
>>> rowMeans(matrix(x, nrow = 1, ncol = 2))
>> [1] NA
>>> rowMeans(matrix(rev(x), nrow = 1, ncol = 2))
>> [1] NaN
>>
>>> .rowMeans(x, m = 1, n = 2)
>> [1] NA
>>> .rowMeans(rev(x), m = 1, n = 2)
>> [1] NaN
>>
>>> .rowSums(x, m = 1, n = 2)
>> [1] NA
>>> .rowSums(rev(x), m = 1, n = 2)
>> [1] NaN
>>
>>> rowSums(matrix(x, nrow = 1, ncol = 2))
>> [1] NA
>>> rowSums(matrix(rev(x), nrow = 1, ncol = 2))
>> [1] NaN
>>
>> I'd expect NA to trump NaN in all cases (with na.rm = FALSE). 
sum()
>> does not have this problem and returns NA in both cases (*).
>>
>> For the same R version build from source on RHEL 6.6 system
>> (completely different architecture), I get the expected result (= NA)
>> for all of the above cases, e.g.
>>
>>> x <- c(NA, NaN)
>>> mean(x)
>> [1] NA
>>> mean(rev(x))
>> [1] NA
>> [...]
>>
>> Before going insane trying to troubleshoot this, I have a vague memory
>> that this, or something related to this, has been discussed
>> previously, but I cannot locate it.
>>
>> Is the above a bug in R, a FAQ, a build error, overzealous compiler
>> optimization, and / or ...?
>>
>> Thanks,
>>
>> Henrik
>
>
-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Apparently Analagous Threads

Search for more reasonably related threads

R devel - Apr 2017 - mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

[Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

[Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

[Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

[Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

Apparently Analagous Threads