Dear R devs, I am probably missing something obvious, but still trying to understand why the 1954 from the definition of an NA has to fill 32 bits when it normally doesn't need more than 16. Wouldn't the code below achieve exactly the same thing? typedef union { double value; unsigned short word[4]; } ieee_double; #ifdef WORDS_BIGENDIAN static CONST int hw = 0; static CONST int lw = 3; #else /* !WORDS_BIGENDIAN */ static CONST int hw = 3; static CONST int lw = 0; #endif /* WORDS_BIGENDIAN */ static double R_ValueOfNA(void) { volatile ieee_double x; x.word[hw] = 0x7ff0; x.word[lw] = 1954; return x.value; } This question has to do with the tagged NA values from package haven, on which I want to improve. Every available bit counts, especially if multi-byte characters are going to be involved. Best wishes, -- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr. 90-92 050663 Bucharest sector 5 Romania https://adriandusa.eu [[alternative HTML version deleted]]
This is because the NA in question is NA_real_, which is encoded in double precision IEEE-754, which uses 64 bits.? The "1954" is just part of the NA.? The NA must also conform to the NaN encoding for double precision numbers, which requires that the "beginning" portion of the number be "0x7ff0" (well, I think it should be "0x7ff8" but that's a different story), as you can see here: ? ? x.word[hw] = 0x7ff0; ? ? x.word[lw] = 1954; Both those components are part of the same double precision value.? They are just accessed this way to make it easy to set the high bits (63-32) and the low bits (31-0). So NA is not just 1954, its 0x7ff0 0000 & 1954 (note I'm mixing hex and decimals here). In IEEE 754 double precision encoding numbers that start with 0x7ff are all NaNs.? The rest of the number except for the first bit which designates "quiet" vs "signaling" NaNs can be anything.? R has taken advantage of that to designate the R NA by setting the lower bits to be 1954. Note I'm being pretty loose about endianess, etc. here, but hopefully this conveys the problem. In terms of your proposal, I'm not entirely sure what you gain. You're still attempting to generate a 64 bit representation in the end.? If all you need is to encode the fact that there was an NA, and restore it later as a 64 bit NA, then you can do whatever you want so long as the end result conforms to the expected encoding. In terms of using 'short' here (which again, I don't see the need for as you're using it to generate the final 64 bit encoding), I see two possible problems.? You're adding the dependency that short will be 16 bits.? We already have the (implicit) assumption in R that double is 64 bits, and explicit that int is 32 bits. But I think you'd be going a bit on a limb assuming that short is 16 bits (not sure).? More important, if short is indeed 16 bits, I think in: ??? x.word[hw] = 0x7ff0; You overflow short. Best, B. On Sunday, May 23, 2021, 8:56:18 AM EDT, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: Dear R devs, I am probably missing something obvious, but still trying to understand why the 1954 from the definition of an NA has to fill 32 bits when it normally doesn't need more than 16. Wouldn't the code below achieve exactly the same thing? typedef union { ? ? double value; ? ? unsigned short word[4]; } ieee_double; #ifdef WORDS_BIGENDIAN static CONST int hw = 0; static CONST int lw = 3; #else? /* !WORDS_BIGENDIAN */ static CONST int hw = 3; static CONST int lw = 0; #endif /* WORDS_BIGENDIAN */ static double R_ValueOfNA(void) { ? ? volatile ieee_double x; ? ? x.word[hw] = 0x7ff0; ? ? x.word[lw] = 1954; ? ? return x.value; } This question has to do with the tagged NA values from package haven, on which I want to improve. Every available bit counts, especially if multi-byte characters are going to be involved. Best wishes, -- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr. 90-92 050663 Bucharest sector 5 Romania https://adriandusa.eu ??? [[alternative HTML version deleted]] ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
TLDR: tagging R NAs is not possible. External software should not depend on how R currently implements NA, this may change at any time. Tagging of NA is not supported in R (if it were, it would have been documented). It would not be possible to implement such tagging reliably with the current implementation of NA in R. NaN payload propagation is not standardized. Compilers are free to and do optimize code not preserving/achieving any specific propagation. CPUs/FPUs differ in how they propagate in binary operations, some zero the payload on any operation. Virtualized environments, binary translations, etc, may not preserve it in any way, either. ?NA has disclaimers about this, an NA may become NaN (payload lost) even in unary operations and also in binary operations not involving other NaN/NAs. Writing any new software that would depend on that anything specific happens to the NaN payloads would not be a good idea. One can only reliably use the NaN payload bits for storage, that is if one avoids any computation at all, avoids passing the values to any external code unaware of such tagging (including R), etc. If such software wants any NaN to be understood as NA by R, it would have to use the documented R API for this (so essentially translating) - but given the problems mentioned above, there is really no point in doing that, because such NAs become NaNs at any time. Best Tomas On 5/23/21 9:56 AM, Adrian Du?a wrote:> Dear R devs, > > I am probably missing something obvious, but still trying to understand why > the 1954 from the definition of an NA has to fill 32 bits when it normally > doesn't need more than 16. > > Wouldn't the code below achieve exactly the same thing? > > typedef union > { > double value; > unsigned short word[4]; > } ieee_double; > > > #ifdef WORDS_BIGENDIAN > static CONST int hw = 0; > static CONST int lw = 3; > #else /* !WORDS_BIGENDIAN */ > static CONST int hw = 3; > static CONST int lw = 0; > #endif /* WORDS_BIGENDIAN */ > > > static double R_ValueOfNA(void) > { > volatile ieee_double x; > x.word[hw] = 0x7ff0; > x.word[lw] = 1954; > return x.value; > } > > This question has to do with the tagged NA values from package haven, on > which I want to improve. Every available bit counts, especially if > multi-byte characters are going to be involved. > > Best wishes,