Michael Chirico
2019-Jan-11 06:36 UTC
[Rd] strtoi output of empty string inconsistent across platforms
Identified as root cause of a bug in data.table: https://github.com/Rdatatable/data.table/issues/3267 On my machine, strtoi("", base = 2L) produces NA_integer_ (which seems consistent with ?strtoi: "Values which cannot be interpreted as integers or would overflow are returned as NA_integer_"). But on all the other machines I've seen, 0L is returned. This seems to be consistent with the output of a simple C program using the underlying strtol function (see data.table link for this program, and for full sessionInfo() of some environments with differing output). So, what is the correct output of strtoi("", base = 2L)? Is the cross-platform inconsistency to be expected/documentable? Michael Chirico [[alternative HTML version deleted]]
Martin Maechler
2019-Jan-11 08:44 UTC
[Rd] strtoi output of empty string inconsistent across platforms
>>>>> Michael Chirico >>>>> on Fri, 11 Jan 2019 14:36:17 +0800 writes:> Identified as root cause of a bug in data.table: > https://github.com/Rdatatable/data.table/issues/3267 > On my machine, strtoi("", base = 2L) produces NA_integer_ > (which seems consistent with ?strtoi: "Values which cannot > be interpreted as integers or would overflow are returned > as NA_integer_"). indeed consistent with R's documentation on strtoi(). What machine would that be? > But on all the other machines I've seen, 0L is > returned. This seems to be consistent with the output of a > simple C program using the underlying strtol function (see > data.table link for this program, and for full > sessionInfo() of some environments with differing output). > So, what is the correct output of strtoi("", base = 2L)? > Is the cross-platform inconsistency to be > expected/documentable? The inconsistency is certainly undesirable. The relevant utility function in R's source (<R>/src/main/character.c) is static int strtoi(SEXP s, int base) { long int res; char *endp; /* strtol might return extreme values on error */ errno = 0; if(s == NA_STRING) return(NA_INTEGER); res = strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res < INT_MIN) res = NA_INTEGER; return (int) res; } and so it clearly is a platform-inconsistency in the underlying C library's strtol(). I think we should make this cross-platform consistent... and indeed it make much sense to ensure the result of strtoi("", base=2L) to become NA_integer_ but changes are that would break code that has relied on the current behavior {on "all but your computer" ;-)} ? > Michael Chirico Thank you for the report, Martin Maechler ETH Zurich and R Core Team
Martin Maechler
2019-Jan-11 18:00 UTC
[Rd] strtoi output of empty string inconsistent across platforms
>>>>> Martin Maechler >>>>> on Fri, 11 Jan 2019 09:44:14 +0100 writes:>>>>> Michael Chirico >>>>> on Fri, 11 Jan 2019 14:36:17 +0800 writes:>> Identified as root cause of a bug in data.table: >> https://github.com/Rdatatable/data.table/issues/3267 >> On my machine, strtoi("", base = 2L) produces NA_integer_ >> (which seems consistent with ?strtoi: "Values which >> cannot be interpreted as integers or would overflow are >> returned as NA_integer_"). > indeed consistent with R's documentation on strtoi(). > What machine would that be? >> But on all the other machines I've seen, 0L is >> returned. This seems to be consistent with the output of >> a simple C program using the underlying strtol function >> (see data.table link for this program, and for full >> sessionInfo() of some environments with differing >> output). >> So, what is the correct output of strtoi("", base = 2L)? >> Is the cross-platform inconsistency to be >> expected/documentable? > The inconsistency is certainly undesirable. The relevant > utility function in R's source (<R>/src/main/character.c) > is > static int strtoi(SEXP s, int base) { long int res; char > *endp; > /* strtol might return extreme values on error */ > errno = 0; > if(s == NA_STRING) return(NA_INTEGER); res > strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || > *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res < > INT_MIN) res = NA_INTEGER; return (int) res; } > and so it clearly is a platform-inconsistency in the > underlying C library's strtol(). (corrected typos here: ) > I think we should make this cross-platform consistent ... > and indeed it makes much sense to ensure the result of > strtoi("", base=2L) to become NA_integer_ > but chances are that would break code that has relied on > the current behavior {on "all but your computer" ;-)} ? I still think that such a change should be done. 'make check all' on the R source (+ Recommended packages) seems not to signal any error or warning with such a change, so I plan to commit that change to "the trunk" / "R-devel" soon, unless concerns are raised highly (and quickly enough). Martin
Reasonably Related Threads
- strtoi output of empty string inconsistent across platforms
- strtoi output of empty string inconsistent across platforms
- strtoi output of empty string inconsistent across platforms
- Inconsistency in handling of numeric input with %d by sprintf
- The function cummax() seems to have a bug.