>>>>> Herv? Pag?s
>>>>> on Sat, 23 Sep 2023 16:52:21 -0700 writes:
> Hi Martin,
> On 9/23/23 06:43, Martin Maechler wrote:
>>>>>>> Herv? Pag?s
>>>>>>> on Fri, 22 Sep 2023 16:55:05 -0700 writes:
>> > The problem is that you have things that are
>> > **semantically** different but look exactly the same:
>>
>> > They look the same:
>>
>> >> x
>> > [1] NA
>> >> y
>> > [1] NA
>> >> z
>> > [1] NA
>>
>> >> is.na(x)
>> > [1] TRUE
>> >> is.na(y)
>> > [1] TRUE
>> >> is.na(z)
>> > [1] TRUE
>>
>> >> str(x)
>> > ?cplx NA
>> >> str(y)
>> > ?num NA
>> >> str(z)
>> > ?cplx NA
>>
>> > but they are semantically different e.g.
>>
>> >> Re(x)
>> > [1] NA
>> >> Re(y)
>> > [1] -0.5? # surprise!
>>
>> >> Im(x)? # surprise!
>> > [1] 2
>> >> Im(z)
>> > [1] NA
>>
>> > so any expression involving Re() or Im() will produce
>> > different results on input that look the same on the
>> > surface.
>>
>> > You can address this either by normalizing the internal
>> > representation of complex NA to always be complex(r=NaN,
>> > i=NA_real_), like for NA_complex_, or by allowing the
>> > infinite variations that are currently allowed and at the
>> > same time making sure that both Re() and Im()? always
>> > return NA_real_ on a complex NA.
>>
>> > My point is that the behavior of complex NA should be
>> > predictable. Right now it's not. Once it's predictable
>> > (with Re() and Im() both returning NA_real_ regardless of
>> > internal representation), then it no longer matters what
>> > kind of complex NA is returned by as.complex(NA_real_),
>> > because they are no onger distinguishable.
>>
>> > H.
>>
>> > On 9/22/23 13:43, Duncan Murdoch wrote:
>> >> Since the result of is.na(x) is the same on each of
>> >> those, I don't see a problem.? As long as that is
>> >> consistent, I don't see a problem. You shouldn't
be using
>> >> any other test for NA-ness.? You should never be
>> >> expecting identical() to treat different types as the
>> >> same (e.g. identical(NA, NA_real_) is FALSE, as it
>> >> should be).? If you are using a different test, that's
>> >> user error.
>> >>
>> >> Duncan Murdoch
>> >>
>> >> On 22/09/2023 2:41 p.m., Herv? Pag?s wrote:
>> >>> We could also question the value of having an infinite
>> >>> number of NA representations in the complex space. For
>> >>> example all these complex values are displayed the
same
>> >>> way (as NA), are considered NAs by is.na(), but are
not
>> >>> identical or semantically equivalent (from an Re() or
>> >>> Im() point of view):
>> >>>
>> >>> ? ??? NA_real_ + 0i
>> >>>
>> >>> ? ??? complex(r=NA_real_, i=Inf)
>> >>>
>> >>> ? ??? complex(r=2, i=NA_real_)
>> >>>
>> >>> ? ??? complex(r=NaN, i=NA_real_)
>> >>>
>> >>> In other words, using a single representation for
>> >>> complex NA (i.e. complex(r=NA_real_, i=NA_real_))
would
>> >>> avoid a lot of unnecessary complications and
surprises.
>> >>>
>> >>> Once you do that, whether as.complex(NA_real_) should
>> >>> return complex(r=NA_real_, i=0) or complex(r=NA_real_,
>> >>> i=NA_real_) becomes a moot point.
>> >>>
>> >>> Best,
>> >>>
>> >>> H.
>>
>> Thank you, Herv?.
>> Your proposition is yet another one,
>> to declare that all complex NA's should be treated as identical
>> (almost/fully?) everywhere.
>>
>> This would be a possibility, but I think a drastic one.
>>
>> I think there are too many cases, where I want to keep the
>> information of the real part independent of the values of the
>> imaginary part (e.g. think of the Riemann hypothesis), and
>> typically vice versa.
> Use NaN for that, not NA.
Aa..h, *that* is your point.
Well, I was on exactly this line till a few years ago.
However, very *sadly* to me, note how example(complex)
nowadays ends :
##----------------------------------------------------------------------------
showC <- function(z) noquote(sprintf("(R = %g, I = %g)", Re(z),
Im(z)))
## The exact result of this *depends* on the platform, compiler, math-library:
(NpNA <- NaN + NA_complex_) ; str(NpNA) # *behaves* as 'cplx NA' ..
stopifnot(is.na(NpNA), is.na(NA_complex_), is.na(Re(NA_complex_)),
is.na(Im(NA_complex_)))
showC(NpNA)# but does not always show '(R = NaN, I = NA)'
## and this is not TRUE everywhere:
identical(NpNA, NA_complex_)
showC(NA_complex_) # always == (R = NA, I = NA)
##----------------------------------------------------------------------------
Unfortunately --- notably by the appearance of the new (M1, M1 pro, M2, ...)
processors, but not only ---
I (and others, but the real experts) have wrongly assumed that
NA {which on the C-level is *one* of the many possible internal NaN's}
would be preserved in computations, as they are on the R level
-- well, typically, and as long as we've used intel-compatible
chips and gcc-compilers.
But modern speed optimizations (also seen in accelerated
Blas/Lapack ..) have noticed that no official C standard
requires such preservations (i.e., in our case of NA, *the* special NaN),
and -- for speed reasons -- now on these accelerated platforms,
R-level NA's "suddenly" turn into R-level NaN's (all are NaN
on
the C level but "with different payload") from quite
"trivial" computations.
Consequently, the strict distinction between NA and NaN
even when they are so important for us statisticians / careful data analysts,
nowadays will tend to have to be dismissed eventually.
... and as I have mentioned also mentioned earlier in this thread,
I believe we should also print the complex values of z
fulfilling is.na(z) by their Re & Im, i.e., e.g.
NA+iNA (or NaN+iNA or NA+iNaN or NaN+iNaN
NA+0i, NaN+1i, 3+iNaN, 4+iNA etc
but note that the exact printing itself should *not* become the topic of this
thread unless by mentioning that I strongly believe the print()ing
of complex vectors in R should change anway *and* for that reason,
the printing / "looks the same as" / ... should not be strong
reasons in my view for deciding how *coercion*,
notably as.complex(.) should work.
Martin
>> With your proposal, for a (potentially large) vector of complex
numbers,
>> after
>> Re(z) <- 1/2
>>
>> I could no longer rely on Re(z) == 1/2,
>> because it would be wrong for those z where (the imaginary part/
the number)
>> was NA/NaN.
> My proposal is to do this only if the Re and/or Im parts are NAs, not
if
> they are NaNs.
> BTW the difference between how NAs and NaNs are treated in complex
> vectors is another issue that adds to the confusion:
> ? > complex(r=NA, i=2)
> ? [1] NA
> ? > complex(r=NaN, i=2)
> ? [1] NaN+2i
> Not displaying the real + imaginary parts in the NA case kind of
> suggests that somehow they are gone i.e. that Re(z) and Im(z) are both
NA.
> Note that my proposal is not to change the display but to change Re()
> and Im() to make them consistent with the display.
> In your Re(z) <- 1/2 example (which seems to be theoretical only
because
> I don't see `Re<-` in base R), any NA in 'z' would be
replaced with
> complex(r=NaN, i=1/2), so you could rely on Re(z) == 1/2.
>> Also, in a similar case, a
>>
>> Im(z) <- NA
>>
>> would have to "destroy" all real parts Re(z);
>> not really typically in memory, but effectively for the user,
Re(z)
>> would be all NA/NaN.
> Yes, setting a value to NA destroys it beyond repair in the sense that
> there's no way you can retrieve any original parts of it. I'm
fine with
> that. I'm not fine with an NA being used to store hidden
information.
>>
>> And I think there are quite a few other situations
>> where looking at Re() and Im() separately makes a lot of sense.
> Still doable if the Re or Im parts contain NaNs.
>>
>> Spencer also made a remark in this direction.
>>
>> All in all I'd be very reluctant to move in this direction;
>> but yes, I'm just one person ... let's continue musing and
>> considering !
> I understand the reluctance since this would not be a light move, but
> thanks for considering.
> Best,
> H.
>>
>> Martin
>>
>> >>> On 9/22/23 03:38, Martin Maechler wrote:
>> >>>>>>>>> Mikael Jagan ????? on Thu, 21
Sep 2023 00:47:39
>> >>>>>>>>> -0400 writes:
>> >>>> ????? > Revisiting this thread from April:
>> >>>>
>> >>>>
>https://stat.ethz.ch/pipermail/r-devel/2023-April/082545.html
>> >>>>
>> >>>> ????? > where the decision (not yet backported)
was
>> >>>> made for ????? > as.complex(NA_real_) to give
>> >>>> NA_complex_ instead of ????? >
complex(r=NA_real_,
>> >>>> i=0), to be consistent with ????? >
help("as.complex")
>> >>>> and as.complex(NA) and as.complex(NA_integer_).
>> >>>>
>> >>>> ????? > Was any consideration given to the
alternative?
>> >>>> ????? > That is, to changing as.complex(NA) and
>> >>>> as.complex(NA_integer_) to ????? > give
>> >>>> complex(r=NA_real_, i=0), consistent with ?????
>
>> >>>> as.complex(NA_real_), then amending
help("as.complex")
>> >>>> ????? > accordingly?
>> >>>>
>> >>>> Hmm, as, from R-core, mostly I was involved, I
admit to
>> >>>> say "no", to my knowledge the (above)
alternative
>> >>>> wasn't considered.
>> >>>>
>> >>>> ??? > The principle that ??? >
>> >>>>
Im(as.complex(<real=(double|integer|logical)>)) should
>> >>>> be zero ??? > is quite fundamental, in my view,
hence
>> >>>> the "new" behaviour ??? > seems to
really violate the
>> >>>> principle of least surprise ...
>> >>>>
>> >>>> of course "least surprise"? is somewhat
subjective.
>> >>>> Still, I clearly agree that the above would be one
>> >>>> desirable property.
>> >>>>
>> >>>> I think that any solution will lead to *some*
surprise
>> >>>> for some cases, I think primarily because there
are
>> >>>> *many* different values z? for which? is.na(z)? is
>> >>>> true,? and in any case NA_complex_? is only of the
>> >>>> many.
>> >>>>
>> >>>> I also agree with Mikael that we should reconsider
the
>> >>>> issue that was raised by Davis Vaughan here
("on
>> >>>> R-devel") last April.
>> >>>>
>> >>>> ????? > Another (but maybe weaker) argument is
that
>> >>>> ????? > double->complex coercions happen
more often
>> >>>> than ????? > logical->complex and
integer->complex
>> >>>> ones. Changing the ????? > behaviour of the
more
>> >>>> frequently performed coercion is ????? > more
likely to
>> >>>> affect code "out there".
>> >>>>
>> >>>> ????? > Yet another argument is that one
expects
>> >>>>
>> >>>> ????? >????? identical(as.complex(NA_real_),
NA_real_ +
>> >>>> (0+0i))
>> >>>>
>> >>>> ????? > to be TRUE, i.e., that coercing from
double to
>> >>>> complex is ????? > equivalent to adding a
complex
>> >>>> zero.? The new behaviour ????? > makes the
above FALSE,
>> >>>> since NA_real_ + (0+0i) gives ????? >
>> >>>> complex(r=NA_real_, i=0).
>> >>>>
>> >>>> No!? --- To my own surprise (!) --- in current
R-devel
>> >>>> the above is TRUE, and ??????? NA_real_ + (0+0i)?
, the
>> >>>> same as ??????? NA_real_ + 0i????? , really gives
>> >>>> complex(r=NA, i=NA) :
>> >>>>
>> >>>> Using showC() from ?complex
>> >>>>
>> >>>> ??? showC <- function(z)
noquote(sprintf("(R = %g, I >> >>>> %g)", Re(z),
Im(z)))
>> >>>>
>> >>>> we see (in R-devel) quite consistently
>> >>>>
>> >>>>> showC(NA_real_ + 0i)
>> >>>> [1] (R = NA, I = NA)
>> >>>>> showC(NA?????? + 0i)? # NA is
'logical'
>> >>>> [1] (R = NA, I = NA) where as in R 4.3.1 and
>> >>>> "R-patched" -- *in*consistently
>> >>>>
>> >>>>> showC(NA_real_ + 0i)
>> >>>> [1] (R = NA, I = 0)
>> >>>>> showC(NA + 0i)
>> >>>> [1] (R = NA, I = NA) .... and honestly, I do not
see
>> >>>> *where* (and when) we changed the underlying code
(in
>> >>>> arithmetic.c !?)? in R-devel to *also* produce
>> >>>> NA_complex_? in such complex *arithmetic*
>> >>>>
>> >>>>
>> >>>> ????? > Having said that, one might also (but
more
>> >>>> naively) expect
>> >>>>
>> >>>> ????? >
>> >>>> identical(as.complex(as.double(NA_complex_)),
>> >>>> NA_complex_)
>> >>>>
>> >>>> ????? > to be TRUE.
>> >>>>
>> >>>> as in current R-devel
>> >>>>
>> >>>> ????? > Under my proposal it continues to be
FALSE.
>> >>>>
>> >>>> as in "R-release"
>> >>>>
>> >>>> ????? > Well, I'd prefer if it gave FALSE
with a
>> >>>> warning ????? > "imaginary parts discarded
in
>> >>>> coercion", but it seems that ????? >
>> >>>> as.double(complex(r=a, i=b)) never warns when
either of
>> >>>> ????? > 'a' and 'b' is NA_real_
or NaN, even where
>> >>>> "information" ????? > {nonzero
'b'} is clearly lost ...
>> >>>>
>> >>>> The question of *warning* here is related indeed,
but I
>> >>>> think we should try to look at it only *secondary*
to
>> >>>> your first proposal.
>> >>>>
>> >>>> ????? > Whatever decision is made about
>> >>>> as.complex(NA_real_), ????? > maybe these
points should
>> >>>> be weighed before it becomes part of ????? >
R-release
>> >>>> ...
>> >>>>
>> >>>> ????? > Mikael
>> >>>>
>> >>>> Indeed.
>> >>>>
>> >>>> Can we please get other opinions / ideas here?
>> >>>>
>> >>>> Thank you in advance for your thoughts! Martin
>> >>>>
>> >>>> ---
>> >>>>
>> >>>> PS:
>> >>>>
>> >>>> ?? Our *print()*ing? of complex NA's
("NA" here meaning
>> >>>> NA or NaN) ?? is also unsatisfactory, e.g. in the
case
>> >>>> where all entries of a ?? vector are NA in the
sense of
>> >>>> is.na(.), but their ?? Re() and Im() are not all
NA:
>> >>>> ??? showC <- function(z)
noquote(sprintf("(R = %g, I >> >>>> %g)", Re(z),
Im(z))) ??? z <- complex(, c(11, NA, NA),
>> >>>> c(NA, 99, NA)) ??? z ??? showC(z)
>> >>>>
>> >>>> gives
>> >>>>
>> >>>> ??? > z ??? [1] NA NA NA ??? > showC(z) ???
[1] (R >> >>>> 11, I = NA) (R = NA, I = 99) (R = NA, I =
NA)
>> >>>>
>> >>>> but that (printing of complex) *is* another issue,
in
>> >>>> which we have the re-opened bugzilla PR#16752
>> >>>>
==>https://bugs.r-project.org/show_bug.cgi?id=16752
>> >>>>
>> >>>> on which we also worked during the R Sprint in
Warwick
>> >>>> three weeks ago, and where I want to commit
changes in
>> >>>> any case {but think we should change even a bit
more
>> >>>> than we got to during the Sprint}.
>> >>>>
>> >>>> ______________________________________________
>> >>>>R-devel at r-project.org ? mailing list
>> >>>>https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>>
>> >>
>> > --
>> > Herv? Pag?s
>>
>> > Bioconductor Core Teamhpages.on.github at gmail.com
>>
>>
>>
> --
> Herv? Pag?s
> Bioconductor Core Team
> hpages.on.github at gmail.com
> [[alternative HTML version deleted]]