Dear Tomas,
I understand that perfectly, but that is fine.
The payload is not going to be used in any computations anyways, it is
strictly an information carrier that differentiates between different types
of (tagged) NA values.
Having only one NA value in R is extremely limiting for the social
sciences, where multiple missing values may exist, because respondents:
- did not know what to respond, or
- did not want to respond, or perhaps
- the question did not apply in a given situation etc.
All of these need to be captured, stored, and most importantly treated as
if they would be regular missing values. Whether the payload might be lost
in computations makes no difference: they were supposed to be "missing
values" anyways.
The original question is how the payload is currently stored: as an
unsigned int of 32 bits, or as an unsigned short of 16 bits. If the R
internals would not be affected (and I see no reason why they would be), it
would allow an entire universe for the social sciences that is not
currently available and which all other major statistical packages do offer.
Thank you very much, your attention is greatly appreciated,
Adrian
On Sun, May 23, 2021 at 7:59 PM Tomas Kalibera <tomas.kalibera at
gmail.com>
wrote:
> TLDR: tagging R NAs is not possible.
>
> External software should not depend on how R currently implements NA,
> this may change at any time. Tagging of NA is not supported in R (if it
> were, it would have been documented). It would not be possible to
> implement such tagging reliably with the current implementation of NA in R.
>
> NaN payload propagation is not standardized. Compilers are free to and
> do optimize code not preserving/achieving any specific propagation.
> CPUs/FPUs differ in how they propagate in binary operations, some zero
> the payload on any operation. Virtualized environments, binary
> translations, etc, may not preserve it in any way, either. ?NA has
> disclaimers about this, an NA may become NaN (payload lost) even in
> unary operations and also in binary operations not involving other NaN/NAs.
>
> Writing any new software that would depend on that anything specific
> happens to the NaN payloads would not be a good idea. One can only
> reliably use the NaN payload bits for storage, that is if one avoids any
> computation at all, avoids passing the values to any external code
> unaware of such tagging (including R), etc. If such software wants any
> NaN to be understood as NA by R, it would have to use the documented R
> API for this (so essentially translating) - but given the problems
> mentioned above, there is really no point in doing that, because such
> NAs become NaNs at any time.
>
> Best
> Tomas
>
> On 5/23/21 9:56 AM, Adrian Du?a wrote:
> > Dear R devs,
> >
> > I am probably missing something obvious, but still trying to
understand
> why
> > the 1954 from the definition of an NA has to fill 32 bits when it
> normally
> > doesn't need more than 16.
> >
> > Wouldn't the code below achieve exactly the same thing?
> >
> > typedef union
> > {
> > double value;
> > unsigned short word[4];
> > } ieee_double;
> >
> >
> > #ifdef WORDS_BIGENDIAN
> > static CONST int hw = 0;
> > static CONST int lw = 3;
> > #else /* !WORDS_BIGENDIAN */
> > static CONST int hw = 3;
> > static CONST int lw = 0;
> > #endif /* WORDS_BIGENDIAN */
> >
> >
> > static double R_ValueOfNA(void)
> > {
> > volatile ieee_double x;
> > x.word[hw] = 0x7ff0;
> > x.word[lw] = 1954;
> > return x.value;
> > }
> >
> > This question has to do with the tagged NA values from package haven,
on
> > which I want to improve. Every available bit counts, especially if
> > multi-byte characters are going to be involved.
> >
> > Best wishes,
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]