And for a starker example of this (documented) inconsistency, arithmetic addition is not commutative: > NA + NaN [1] NA > NaN + NA [1] NaN On Mon, Jul 2, 2018 at 5:32 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 02/07/2018 11:25 AM, Jan Gorecki wrote: >> Hi, >> base::mean is not consistent in terms of handling NA/NaN. >> Mean should not depend on order of its arguments while currently it is. > > The result of mean() can depend on the order even with regular numbers. > For example, > > > x <- rep(c(1, 10^(-15)), 1000000) > > mean(sort(x)) - 0.5 > [1] 5.551115e-16 > > mean(rev(sort(x))) - 0.5 > [1] 0 > > >> >> mean(c(NA, NaN)) >> #[1] NA >> mean(c(NaN, NA)) >> #[1] NaN >> >> I created issue so in case of no replies here status of it can be looked up >> at: >> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441 > > The help page for ?NaN says, > > "Computations involving NaN will return NaN or perhaps NA: which of > those two is not guaranteed and may depend on the R platform (since > compilers may re-order computations)." > > And ?NA says, > > "Numerical computations using NA will normally result in NA: a possible > exception is where NaN is also involved, in which case either might > result (which may depend on the R platform). " > > So I doubt if this inconsistency will be fixed. > > Duncan Murdoch > >> >> Best, >> Jan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Thank you for interesting examples. I would find useful to document this behavior also in `?mean`, while `+` operator is also affected, the `sum` function is not. For mean, NA / NaN could be handled in loop in summary.c. I assume that performance penalty of fix is the reason why this inconsistency still exists. Jan On Mon, Jul 2, 2018 at 8:28 PM, Barry Rowlingson < b.rowlingson at lancaster.ac.uk> wrote:> And for a starker example of this (documented) inconsistency, > arithmetic addition is not commutative: > > > NA + NaN > [1] NA > > NaN + NA > [1] NaN > > > > On Mon, Jul 2, 2018 at 5:32 PM, Duncan Murdoch <murdoch.duncan at gmail.com> > wrote: > > On 02/07/2018 11:25 AM, Jan Gorecki wrote: > >> Hi, > >> base::mean is not consistent in terms of handling NA/NaN. > >> Mean should not depend on order of its arguments while currently it is. > > > > The result of mean() can depend on the order even with regular numbers. > > For example, > > > > > x <- rep(c(1, 10^(-15)), 1000000) > > > mean(sort(x)) - 0.5 > > [1] 5.551115e-16 > > > mean(rev(sort(x))) - 0.5 > > [1] 0 > > > > > >> > >> mean(c(NA, NaN)) > >> #[1] NA > >> mean(c(NaN, NA)) > >> #[1] NaN > >> > >> I created issue so in case of no replies here status of it can be > looked up > >> at: > >> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441 > > > > The help page for ?NaN says, > > > > "Computations involving NaN will return NaN or perhaps NA: which of > > those two is not guaranteed and may depend on the R platform (since > > compilers may re-order computations)." > > > > And ?NA says, > > > > "Numerical computations using NA will normally result in NA: a possible > > exception is where NaN is also involved, in which case either might > > result (which may depend on the R platform). " > > > > So I doubt if this inconsistency will be fixed. > > > > Duncan Murdoch > > > >> > >> Best, > >> Jan > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-devel at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Yes, the performance overhead of fixing this at R level would be too large and it would complicate the code significantly. The result of binary operations involving NA and NaN is hardware dependent (the propagation of NaN payload) - on some hardware, it actually works the way we would like - NA is returned - but on some hardware you get NaN or sometimes NA and sometimes NaN. Also there are C compiler optimizations re-ordering code, as mentioned in ?NaN. Then there are also external numerical libraries that do not distinguish NA from NaN (NA is an R concept). So I am afraid this is unfixable. The disclaimer mentioned by Duncan is in ?NaN/?NA, which I think is ok - there are so many numerical functions through which one might run into these problems that it would be infeasible to document them all. Some functions in fact will preserve NA, and we would not let NA turn into NaN unnecessarily, but the disclaimer says it is something not to depend on. Tomas On 07/03/2018 11:12 AM, Jan Gorecki wrote:> Thank you for interesting examples. > I would find useful to document this behavior also in `?mean`, while `+` > operator is also affected, the `sum` function is not. > For mean, NA / NaN could be handled in loop in summary.c. I assume that > performance penalty of fix is the reason why this inconsistency still > exists. > Jan > > On Mon, Jul 2, 2018 at 8:28 PM, Barry Rowlingson < > b.rowlingson at lancaster.ac.uk> wrote: > >> And for a starker example of this (documented) inconsistency, >> arithmetic addition is not commutative: >> >> > NA + NaN >> [1] NA >> > NaN + NA >> [1] NaN >> >> >> >> On Mon, Jul 2, 2018 at 5:32 PM, Duncan Murdoch <murdoch.duncan at gmail.com> >> wrote: >>> On 02/07/2018 11:25 AM, Jan Gorecki wrote: >>>> Hi, >>>> base::mean is not consistent in terms of handling NA/NaN. >>>> Mean should not depend on order of its arguments while currently it is. >>> The result of mean() can depend on the order even with regular numbers. >>> For example, >>> >>> > x <- rep(c(1, 10^(-15)), 1000000) >>> > mean(sort(x)) - 0.5 >>> [1] 5.551115e-16 >>> > mean(rev(sort(x))) - 0.5 >>> [1] 0 >>> >>> >>>> mean(c(NA, NaN)) >>>> #[1] NA >>>> mean(c(NaN, NA)) >>>> #[1] NaN >>>> >>>> I created issue so in case of no replies here status of it can be >> looked up >>>> at: >>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441 >>> The help page for ?NaN says, >>> >>> "Computations involving NaN will return NaN or perhaps NA: which of >>> those two is not guaranteed and may depend on the R platform (since >>> compilers may re-order computations)." >>> >>> And ?NA says, >>> >>> "Numerical computations using NA will normally result in NA: a possible >>> exception is where NaN is also involved, in which case either might >>> result (which may depend on the R platform). " >>> >>> So I doubt if this inconsistency will be fixed. >>> >>> Duncan Murdoch >>> >>>> Best, >>>> Jan >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel