Hervé Pagès
2021-Sep-10 19:13 UTC
[Rd] Spurious warnings in coercion from double/complex/character to raw
On 10/09/2021 09:12, Duncan Murdoch wrote:> On 10/09/2021 11:29 a.m., Herv? Pag?s wrote: >> Hi, >> >> The first warning below is unexpected and confusing: >> >> ??? > as.raw(c(3e9, 5.1)) >> ??? [1] 00 05 >> ??? Warning messages: >> ??? 1: NAs introduced by coercion to integer range >> ??? 2: out-of-range values treated as 0 in coercion to raw >> >> The reason we get it is that coercion from numeric to raw is currently >> implemented on top of coercion from numeric to int (file >> src/main/coerce.c, lines 700-710): >> >> ????? case REALSXP: >> ????????? for (i = 0; i < n; i++) { >> //????????? if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); >> ????????????? tmp = IntegerFromReal(REAL_ELT(v, i), &warn); >> ????????????? if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) { >> ????????????????? tmp = 0; >> ????????????????? warn |= WARN_RAW; >> ????????????? } >> ????????????? pa[i] = (Rbyte) tmp; >> ????????? } >> ????????? break; >> >> The first warning comes from the call to IntegerFromReal(). >> >> The following code avoids the spurious warning and is also simpler and >> slightly faster: >> >> ????? case REALSXP: >> ????????? for (i = 0; i < n; i++) { >> //????????? if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); >> ????????????? double vi = REAL_ELT(v, i); >> ????????????? if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) { >> ????????????????? tmp = 0; >> ????????????????? warn |= WARN_RAW; >> ????????????? } >> ????????????? pa[i] = (Rbyte) tmp; >> ????????? } >> ????????? break; > > Doesn't that give different results in case vi is so large that "(int) > vi" overflows?? (I don't know what the C standard says, but some online > references say that behaviour is implementation dependent.) > > For example, if > > ? vi = 1.0 +? INT_MAX; > > wouldn't "(int) vi" be equal to a small integer?Good catch, thanks! Replacing if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) { tmp = 0; warn |= WARN_RAW; } pa[i] = (Rbyte) tmp; with if(ISNAN(vi) || vi <= -1.0 || vi >= 256.0) { tmp = 0; warn |= WARN_RAW; } else { tmp = (int) vi; } pa[i] = (Rbyte) tmp; should address that. FWIW IntegerFromReal() has a similar risk of int overflow (src/main/coerce.c, lines 128-138): int attribute_hidden IntegerFromReal(double x, int *warn) { if (ISNAN(x)) return NA_INTEGER; else if (x >= INT_MAX+1. || x <= INT_MIN ) { *warn |= WARN_INT_NA; return NA_INTEGER; } return (int) x; } The cast to int will also be an int overflow situation if x is > INT_MAX and < INT_MAX+1 so the risk is small! There are other instances of this situation in IntegerFromComplex() and IntegerFromString(). More below...> > Duncan Murdoch > > >> >> Coercion from complex to raw has the same problem: >> >> ??? > as.raw(c(3e9+0i, 5.1)) >> ??? [1] 00 05 >> ??? Warning messages: >> ??? 1: NAs introduced by coercion to integer range >> ??? 2: out-of-range values treated as 0 in coercion to raw >> >> Current implementation (file src/main/coerce.c, lines 711-721): >> >> ????? case CPLXSXP: >> ????????? for (i = 0; i < n; i++) { >> //????????? if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); >> ????????????? tmp = IntegerFromComplex(COMPLEX_ELT(v, i), &warn); >> ????????????? if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) { >> ????????????????? tmp = 0; >> ????????????????? warn |= WARN_RAW; >> ????????????? } >> ????????????? pa[i] = (Rbyte) tmp; >> ????????? } >> ????????? break; >> >> This implementation has the following additional problem when the >> supplied complex has a nonzero imaginary part: >> >> ??? > as.raw(300+4i) >> ??? [1] 00 >> ??? Warning messages: >> ??? 1: imaginary parts discarded in coercion >> ??? 2: out-of-range values treated as 0 in coercion to raw >> >> ??? > as.raw(3e9+4i) >> ??? [1] 00 >> ??? Warning messages: >> ??? 1: NAs introduced by coercion to integer range >> ??? 2: out-of-range values treated as 0 in coercion to raw >> >> In one case we get a warning about the discarding of the imaginary part >> but not the other case, which is unexpected. We should see the exact >> same warning (or warnings) in both cases. >> >> With the following fix we only get the warning about the discarding of >> the imaginary part if we are not in a "out-of-range values treated as 0 >> in coercion to raw" situation: >> >> ????? case CPLXSXP: >> ????????? for (i = 0; i < n; i++) { >> //????????? if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt(); >> ????????????? Rcomplex vi = COMPLEX_ELT(v, i); >> ????????????? if(ISNAN(vi.r) || ISNAN(vi.i) || (tmp = (int) vi.r) < 0 || >> tmp > 255) { >> ????????????????? tmp = 0; >> ????????????????? warn |= WARN_RAW; >> ????????????? } else { >> ????????????????? if(vi.i != 0.0) >> ????????????????????? warn |= WARN_IMAG; >> ????????????? } >> ????????????? pa[i] = (Rbyte) tmp; >> ????????? } >> ????????? break;Corrected version: if(ISNAN(vi.r) || ISNAN(vi.i) || vi.r <= -1.00 || vi.r >= 256.00) { tmp = 0; warn |= WARN_RAW; } else { tmp = (int) vi.r; if(vi.i != 0.0) warn |= WARN_IMAG; } pa[i] = (Rbyte) tmp; Hopefully this time I got it right. Best, H.>> >> Finally, coercion from character to raw has the same problem and its >> code can be fixed in a similar manner: >> >> ??? > as.raw(c("3e9", 5.1)) >> ??? [1] 00 05 >> ??? Warning messages: >> ??? 1: NAs introduced by coercion to integer range >> ??? 2: out-of-range values treated as 0 in coercion to raw >> >> Cheers, >> H. >> >> >-- Herv? Pag?s Bioconductor Core Team hpages.on.github at gmail.com
brodie gaslam
2021-Sep-10 19:53 UTC
[Rd] Spurious warnings in coercion from double/complex/character to raw
> On Friday, September 10, 2021, 03:13:54 PM EDT, Herv? Pag?s <hpages.on.github at gmail.com> wrote: > > Good catch, thanks! > > Replacing > >???? if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) { >???????? tmp = 0; > >???????? warn |= WARN_RAW; > >???? } >???? pa[i] = (Rbyte) tmp; > > with > >???? if(ISNAN(vi) || vi <= -1.0 || vi >= 256.0) >?? { >???????? tmp = 0; > >???????? warn |= WARN_RAW; > >???? } else { >???????? tmp = (int) vi; >???? } >???? pa[i] = (Rbyte) tmp; > > should address that. > > FWIW IntegerFromReal() has a similar risk of int overflow > (src/main/coerce.c, lines 128-138): > > >?? int attribute_hidden > >?? IntegerFromReal(double x, int *warn) > >?? { > >?????? if (ISNAN(x)) > >?????????? return NA_INTEGER; > >?????? else if (x >= INT_MAX+1. || x <= INT_MIN ) { > >?????????? *warn |= WARN_INT_NA; > >?????????? return NA_INTEGER; > >?????? } > >?????? return (int) x; > >?? } > > > > The cast to int will also be an int overflow situation if x is > INT_MAX > and < INT_MAX+1 so the risk is small!I might be being dense, but it feels this isn't a problem?? Quoting C99 6.3.1.4 again (emph added):> When a finite value of real floating type is converted to an integer > type other than _Bool, **the fractional part is discarded** (i.e., the > value is truncated toward zero). If the value of the integral part > cannot be represented by the integer type, the behavior is undefined.50)Does the "fractional part is discarded" not save us here? Best, B.