thr3ads.net - R devel - [Rd] Should 0L * NA_integer

If this information is useful, please help other people find it:
Share via:

Michael Chirico

2020-May-23 10:08 UTC

[Rd] Should 0L * NA_integer_ be 0L?

I don't see this specific case documented anywhere (I also tried to search
the r-devel archives, as well as I could); the only close reference
mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's also this
snippet from R-lang:

In cases where the result of the operation would be the same for
all> possible values the NA could take, the operation may return this value.
>
This begs the question -- shouldn't 0L * NA_integer_ be 0L?

Because this is an integer operation, and according to this definition of
NA:

Missing values in the statistical sense, that is, variables whose
value> is not known, have the value @code{NA}
>
NA_integer_ should be an unknown integer value between -2^31+1 and 2^31-1.
Multiplying any of these values by 0 results in 0 -- that is, the result of
the operation would be 0 for all possible values the NA could take.

This came up from what seems like an inconsistency to me:

all(NA, FALSE)
# [1] FALSE
NA * FALSE
# [1] NA

I agree with all(NA, FALSE) being FALSE because we know for sure that all
cannot be true. The same can be said of the multiplication -- whether NA
represents TRUE or FALSE, the resulting value is 0 (FALSE).

I also agree with the numeric case, FWIW: NA_real_ * 0 has to be NA_real_,
because NA_real_ could be Inf or NaN, for both of which multiplication by 0
gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be NA_real_.

	[[alternative HTML version deleted]]

Martin Maechler

2020-May-23 10:49 UTC

head link

[Rd] Should 0L * NA_integer_ be 0L?

>>>>> Michael Chirico 
>>>>>     on Sat, 23 May 2020 18:08:22 +0800 writes:
    > I don't see this specific case documented anywhere (I also tried to
search
    > the r-devel archives, as well as I could); the only close reference
    > mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's also
this
    > snippet from R-lang:

    > In cases where the result of the operation would be the same for all
    >> possible values the NA could take, the operation may return this
value.
    >> 

    > This begs the question -- shouldn't 0L * NA_integer_ be 0L?

    > Because this is an integer operation, and according to this definition
of
    > NA:

    > Missing values in the statistical sense, that is, variables whose value
    >> is not known, have the value @code{NA}
    >> 

    > NA_integer_ should be an unknown integer value between -2^31+1 and
2^31-1.
    > Multiplying any of these values by 0 results in 0 -- that is, the
result of
    > the operation would be 0 for all possible values the NA could take.


    > This came up from what seems like an inconsistency to me:

    > all(NA, FALSE)
    > # [1] FALSE
    > NA * FALSE
    > # [1] NA

    > I agree with all(NA, FALSE) being FALSE because we know for sure that
all
    > cannot be true. The same can be said of the multiplication -- whether
NA
    > represents TRUE or FALSE, the resulting value is 0 (FALSE).

    > I also agree with the numeric case, FWIW: NA_real_ * 0 has to be
NA_real_,
    > because NA_real_ could be Inf or NaN, for both of which multiplication
by 0
    > gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be
NA_real_.

I agree about almost everything you say above. ...
but possibly the main conclusion.

The problem with your proposed change would be that  integer
arithmetic gives a different result than the corresponding
"numeric" computation.
(I don't remember other such cases in R, at least as long as the
 integer arithmetic does not overflow.)

One principle to decided such problems in S and R has been that
the user should typically *not* have to know if their data is
stored in float/double or in integer, and the results should be the same
(possibly apart from staying integer for some operations).


{{Note that there are also situations were it's really
  undesirable that    0 * NA   does *not* give 0 (but NA);
  notably in sparse matrix operations where you'd very often can
  now that NA was not Inf (or NaN) and you really would like to
  preserve sparseness ...}}


    > [[alternative HTML version deleted]]

    (as you did not use plain text ..)

Michael Chirico

2020-May-23 12:00 UTC

head link

[Rd] Should 0L * NA_integer_ be 0L?

OK, so maybe one way to paraphrase:

For R, the boundedness of integer vectors is an implementation detail,
rather than a deeper mathematical fact that can be exploited for this
case.

One might also expect then that overflow wouldn't result in NA, but
rather automatically cast up to numeric? But that this doesn't happen
for efficiency reasons?

Would it make any sense to have a different carveout for the logical
case? For logical, storage as integer might be seen as a similar type
of implementation detail (though if we're being this strict, the
question arises of what multiplication of logical values even means).

FALSE * NA = 0L


On Sat, May 23, 2020 at 6:49 PM Martin Maechler
<maechler at stat.math.ethz.ch> wrote:>
> >>>>> Michael Chirico
> >>>>>     on Sat, 23 May 2020 18:08:22 +0800 writes:
>
>     > I don't see this specific case documented anywhere (I also
tried to search
>     > the r-devel archives, as well as I could); the only close
reference
>     > mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's
also this
>     > snippet from R-lang:
>
>     > In cases where the result of the operation would be the same for
all
>     >> possible values the NA could take, the operation may return
this value.
>     >>
>
>     > This begs the question -- shouldn't 0L * NA_integer_ be 0L?
>
>     > Because this is an integer operation, and according to this
definition of
>     > NA:
>
>     > Missing values in the statistical sense, that is, variables whose
value
>     >> is not known, have the value @code{NA}
>     >>
>
>     > NA_integer_ should be an unknown integer value between -2^31+1 and
2^31-1.
>     > Multiplying any of these values by 0 results in 0 -- that is, the
result of
>     > the operation would be 0 for all possible values the NA could
take.
>
>
>     > This came up from what seems like an inconsistency to me:
>
>     > all(NA, FALSE)
>     > # [1] FALSE
>     > NA * FALSE
>     > # [1] NA
>
>     > I agree with all(NA, FALSE) being FALSE because we know for sure
that all
>     > cannot be true. The same can be said of the multiplication --
whether NA
>     > represents TRUE or FALSE, the resulting value is 0 (FALSE).
>
>     > I also agree with the numeric case, FWIW: NA_real_ * 0 has to be
NA_real_,
>     > because NA_real_ could be Inf or NaN, for both of which
multiplication by 0
>     > gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be
NA_real_.
>
> I agree about almost everything you say above. ...
> but possibly the main conclusion.
>
> The problem with your proposed change would be that  integer
> arithmetic gives a different result than the corresponding
> "numeric" computation.
> (I don't remember other such cases in R, at least as long as the
>  integer arithmetic does not overflow.)
>
> One principle to decided such problems in S and R has been that
> the user should typically *not* have to know if their data is
> stored in float/double or in integer, and the results should be the same
> (possibly apart from staying integer for some operations).
>
>
> {{Note that there are also situations were it's really
>   undesirable that    0 * NA   does *not* give 0 (but NA);
>   notably in sparse matrix operations where you'd very often can
>   now that NA was not Inf (or NaN) and you really would like to
>   preserve sparseness ...}}
>
>
>     > [[alternative HTML version deleted]]
>
>     (as you did not use plain text ..)

Maybe Matching Threads

Search for more possibly parallel threads

R devel - May 2020 - Should 0L * NA_integer_ be 0L?

[Rd] Should 0L * NA_integer_ be 0L?

[Rd] Should 0L * NA_integer_ be 0L?

[Rd] Should 0L * NA_integer_ be 0L?

Maybe Matching Threads