Martin Maechler
2019-Jan-11 18:00 UTC
[Rd] strtoi output of empty string inconsistent across platforms
>>>>> Martin Maechler >>>>> on Fri, 11 Jan 2019 09:44:14 +0100 writes:>>>>> Michael Chirico >>>>> on Fri, 11 Jan 2019 14:36:17 +0800 writes:>> Identified as root cause of a bug in data.table: >> https://github.com/Rdatatable/data.table/issues/3267 >> On my machine, strtoi("", base = 2L) produces NA_integer_ >> (which seems consistent with ?strtoi: "Values which >> cannot be interpreted as integers or would overflow are >> returned as NA_integer_"). > indeed consistent with R's documentation on strtoi(). > What machine would that be? >> But on all the other machines I've seen, 0L is >> returned. This seems to be consistent with the output of >> a simple C program using the underlying strtol function >> (see data.table link for this program, and for full >> sessionInfo() of some environments with differing >> output). >> So, what is the correct output of strtoi("", base = 2L)? >> Is the cross-platform inconsistency to be >> expected/documentable? > The inconsistency is certainly undesirable. The relevant > utility function in R's source (<R>/src/main/character.c) > is > static int strtoi(SEXP s, int base) { long int res; char > *endp; > /* strtol might return extreme values on error */ > errno = 0; > if(s == NA_STRING) return(NA_INTEGER); res > strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || > *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res < > INT_MIN) res = NA_INTEGER; return (int) res; } > and so it clearly is a platform-inconsistency in the > underlying C library's strtol(). (corrected typos here: ) > I think we should make this cross-platform consistent ... > and indeed it makes much sense to ensure the result of > strtoi("", base=2L) to become NA_integer_ > but chances are that would break code that has relied on > the current behavior {on "all but your computer" ;-)} ? I still think that such a change should be done. 'make check all' on the R source (+ Recommended packages) seems not to signal any error or warning with such a change, so I plan to commit that change to "the trunk" / "R-devel" soon, unless concerns are raised highly (and quickly enough). Martin
Michael Chirico
2019-Jan-12 09:34 UTC
[Rd] strtoi output of empty string inconsistent across platforms
Thanks Martin. For what it's worth, this extremely representative, highly scientific Twitter poll suggests the Mac/Linux split is pretty stark (NA on Mac, 0 on Linux) https://twitter.com/michael_chirico/status/1083649190117306369?s=17 On Sat, Jan 12, 2019, 2:00 AM Martin Maechler <maechler at stat.math.ethz.ch wrote:> >>>>> Martin Maechler > >>>>> on Fri, 11 Jan 2019 09:44:14 +0100 writes: > > >>>>> Michael Chirico > >>>>> on Fri, 11 Jan 2019 14:36:17 +0800 writes: > > >> Identified as root cause of a bug in data.table: > >> https://github.com/Rdatatable/data.table/issues/3267 > > >> On my machine, strtoi("", base = 2L) produces NA_integer_ > >> (which seems consistent with ?strtoi: "Values which > >> cannot be interpreted as integers or would overflow are > >> returned as NA_integer_"). > > > indeed consistent with R's documentation on strtoi(). > > What machine would that be? > > >> But on all the other machines I've seen, 0L is > >> returned. This seems to be consistent with the output of > >> a simple C program using the underlying strtol function > >> (see data.table link for this program, and for full > >> sessionInfo() of some environments with differing > >> output). > > >> So, what is the correct output of strtoi("", base = 2L)? > > >> Is the cross-platform inconsistency to be > >> expected/documentable? > > > The inconsistency is certainly undesirable. The relevant > > utility function in R's source (<R>/src/main/character.c) > > is > > > static int strtoi(SEXP s, int base) { long int res; char > > *endp; > > > /* strtol might return extreme values on error */ > > errno = 0; > > > if(s == NA_STRING) return(NA_INTEGER); res > > strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || > > *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res < > > INT_MIN) res = NA_INTEGER; return (int) res; } > > > and so it clearly is a platform-inconsistency in the > > underlying C library's strtol(). > > (corrected typos here: ) > > > I think we should make this cross-platform consistent ... > > and indeed it makes much sense to ensure the result of > > > strtoi("", base=2L) to become NA_integer_ > > > but chances are that would break code that has relied on > > the current behavior {on "all but your computer" ;-)} ? > > I still think that such a change should be done. > > 'make check all' on the R source (+ Recommended packages) seems > not to signal any error or warning with such a change, so I plan > to commit that change to "the trunk" / "R-devel" soon, unless > concerns are raised highly (and quickly enough). > > Martin >[[alternative HTML version deleted]]
Martin Maechler
2019-Jan-12 14:49 UTC
[Rd] strtoi output of empty string inconsistent across platforms
>>>>> Michael Chirico >>>>> on Sat, 12 Jan 2019 17:34:03 +0800 writes:> Thanks Martin. For what it's worth, this extremely > representative, highly scientific Twitter poll suggests > the Mac/Linux split is pretty stark (NA on Mac, 0 on > Linux) > https://twitter.com/michael_chirico/status/1083649190117306369?s=17 > On Sat, Jan 12, 2019, 2:00 AM Martin Maechler > <maechler at stat.math.ethz.ch wrote: >> >>>>> Martin Maechler >>>>> on Fri, 11 Jan 2019 09:44:14 >> +0100 writes: >> >> >>>>> Michael Chirico >>>>> on Fri, 11 Jan 2019 14:36:17 >> +0800 writes: >> >> >> Identified as root cause of a bug in data.table: >> >> https://github.com/Rdatatable/data.table/issues/3267 >> >> >> On my machine, strtoi("", base = 2L) produces >> NA_integer_ >> (which seems consistent with ?strtoi: >> "Values which >> cannot be interpreted as integers or >> would overflow are >> returned as NA_integer_"). >> >> > indeed consistent with R's documentation on strtoi(). >> > What machine would that be? >> >> >> But on all the other machines I've seen, 0L is >> >> returned. This seems to be consistent with the output of >> >> a simple C program using the underlying strtol >> function >> (see data.table link for this program, and >> for full >> sessionInfo() of some environments with >> differing >> output). >> >> >> So, what is the correct output of strtoi("", base >> 2L)? >> >> >> Is the cross-platform inconsistency to be >> >> expected/documentable? >> >> > The inconsistency is certainly undesirable. The >> relevant > utility function in R's source >> (<R>/src/main/character.c) > is >> >> > static int strtoi(SEXP s, int base) { long int res; >> char > *endp; >> >> > /* strtol might return extreme values on error */ > >> errno = 0; >> >> > if(s == NA_STRING) return(NA_INTEGER); res = > >> strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || > >> *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res >> < > INT_MIN) res = NA_INTEGER; return (int) res; } >> >> > and so it clearly is a platform-inconsistency in the > >> underlying C library's strtol(). >> >> (corrected typos here: ) >> >> > I think we should make this cross-platform consistent >> ... > and indeed it makes much sense to ensure the >> result of >> >> > strtoi("", base=2L) to become NA_integer_ >> >> > but chances are that would break code that has relied >> on > the current behavior {on "all but your computer" >> ;-)} ? >> >> I still think that such a change should be done. >> >> 'make check all' on the R source (+ Recommended packages) >> seems not to signal any error or warning with such a >> change, so I plan to commit that change to "the trunk" / >> "R-devel" soon, unless concerns are raised highly (and >> quickly enough). I've now committed svn rev 75977 to R-devel which does change strtoi("", <base>) to return NA_integer_ (for all <base>) platform independently, not even calling lib C's strtol() in that case. Martin