Dear r-devel, For some numbers, the printed value is not equivalent to the input : options(scipen = 999) ## GOOD 1e24 #> [1] 999999999999999983222784 1e24 == 999999999999999983222784 #> [1] TRUE ## BAD 1e25 #> [1] 10000000000000000905969664 1e25 == 10000000000000000905969664 #> [1] FALSE ## STILL BAD 10000000000000000905969664 #> [1] 10000000000000003053453312 ## GOOD AGAIN 10000000000000003053453312 #> [1] 10000000000000003053453312 # Additionally 10000000000000000000000000 == 1e25 #> [1] FALSE Are these bugs ? [[alternative HTML version deleted]]
On 10 July 2022 at 16:00, Antoine Fabri wrote: | Dear r-devel, | | For some numbers, the printed value is not equivalent to the input : | | options(scipen = 999) | ## GOOD | 1e24 | #> [1] 999999999999999983222784 | 1e24 == 999999999999999983222784 | #> [1] TRUE | | ## BAD | 1e25 | #> [1] 10000000000000000905969664 | 1e25 == 10000000000000000905969664 | #> [1] FALSE | | ## STILL BAD | 10000000000000000905969664 | #> [1] 10000000000000003053453312 | | ## GOOD AGAIN | 10000000000000003053453312 | #> [1] 10000000000000003053453312 | | # Additionally | 10000000000000000000000000 == 1e25 | #> [1] FALSE | | Are these bugs ? No, that is how computers work (with floating point numbers). Please R FAQ 7.31 "Why doesn?t R think these numbers are equal?" at https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f and the references therein for more. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
The following function, 'bitC' from ?numToBits, displays the bits in a double precision number, separated into the sign bit, the 11 exponent bits, and the 52 bits in the mantissa. I've shown the results with your numbers from R-2.4.0 on my Windows 11 Lenovo laptop: what do you get?> bitC <- function(x) noquote(vapply(as.double(x), function(x) { # splitone double + b <- substr(as.character(rev(numToBits(x))), 2L, 2L) + paste0(c(b[1L], " ", b[2:12], " | ", b[13:64]), collapse = "") + }, ""))> bitC(10^25)# [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001> bitC(10000000000000000905969664)# [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001> bitC(10000000000000000905969664 - 10^25)# [1] 0 00000000000 | 0000000000000000000000000000000000000000000000000000> bitC(1e25)# [1] 0 10001010010 | 0000100010110010101000101100001010000000001010010001 -Bill On Sun, Jul 10, 2022 at 7:00 AM Antoine Fabri <antoine.fabri at gmail.com> wrote:> Dear r-devel, > > For some numbers, the printed value is not equivalent to the input : > > options(scipen = 999) > ## GOOD > 1e24 > #> [1] 999999999999999983222784 > 1e24 == 999999999999999983222784 > #> [1] TRUE > > ## BAD > 1e25 > #> [1] 10000000000000000905969664 > 1e25 == 10000000000000000905969664 > #> [1] FALSE > > ## STILL BAD > 10000000000000000905969664 > #> [1] 10000000000000003053453312 > > ## GOOD AGAIN > 10000000000000003053453312 > #> [1] 10000000000000003053453312 > > # Additionally > 10000000000000000000000000 == 1e25 > #> [1] FALSE > > Are these bugs ? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Here's a similar question that may give you some ideas for dealing with this: https://stackoverflow.com/q/72899973/2554330 Duncan Murdoch On 10/07/2022 10:00 a.m., Antoine Fabri wrote:> Dear r-devel, > > For some numbers, the printed value is not equivalent to the input : > > options(scipen = 999) > ## GOOD > 1e24 > #> [1] 999999999999999983222784 > 1e24 == 999999999999999983222784 > #> [1] TRUE > > ## BAD > 1e25 > #> [1] 10000000000000000905969664 > 1e25 == 10000000000000000905969664 > #> [1] FALSE > > ## STILL BAD > 10000000000000000905969664 > #> [1] 10000000000000003053453312 > > ## GOOD AGAIN > 10000000000000003053453312 > #> [1] 10000000000000003053453312 > > # Additionally > 10000000000000000000000000 == 1e25 > #> [1] FALSE > > Are these bugs ? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
I don?t think there is any guarantee that unrepresentable numbers are parsed into defined patterns, because printing is done by the OS while parsing is done by R. The way R parses decimal numbers[1] is simply by using the obvious res = res * 10 + digit and it can be easily checked that for doubles the representation such obtained from 10000000000000000905969664 is 0x1.08b2a2c280292p+83 (see below if you want to see it yourself) which is not the same as 10^25 which is 0x1.08b2a2c280291p+83. This is true on all platforms, it is not specific to M1. The only difference is if your were to use a different type you can obtain a different result - and that is not well-defined (e.g. long doubles have no guarantees at all as of the precision). Note that the decimal string above would require 83-bits of precision which is not representable. (BTW: to make it even more fun, if you were to use double res = 1; repeat(25) res = res * 10; in C, so the naive computation of the original 10^25 you?d get 9999999999999998758486016 and 0x1.08b2a2c28029p+83) Given that printing is done by the OS and parsing by R, I don?t think R guarantees anything. If you want representable number you?d use the binary representation (sprintf(?%a?) or hex-mode deparse as noted). One could argue that it could make sense to change it one way or another - either having R do it all or having the OS do it all. In the latter case one may obtain more consistent results (e.g. system stdtod() yields the original value even on M1), but it would be OS-specific. In the former R could impose its own guarantees - but currently it does not. Cheers, Simon [1] - https://github.com/r-devel/r-svn/blob/97c0a73f1758d09088c200f924d27b362d55ccdc/src/main/util.c#L2094 #include <stdio.h> #include <math.h> #include <stdlib.h> int main() { const char *str = "10000000000000000905969664", *c = str; double ans = 0; while (*c) { ans = 10 * ans + (*(c++) - '0'); printf("%a\n", ans); } printf("atof: %a\n", atof(str)); double pow1025 = pow(10.0, 25); printf("--\n10^25:\n%25.f\n%a\n", pow1025, pow1025); return 0; } 0x1p+0 0x1.4p+3 0x1.9p+6 0x1.f4p+9 0x1.388p+13 0x1.86ap+16 0x1.e848p+19 0x1.312dp+23 0x1.7d784p+26 0x1.dcd65p+29 0x1.2a05f2p+33 0x1.74876e8p+36 0x1.d1a94a2p+39 0x1.2309ce54p+43 0x1.6bcc41e9p+46 0x1.c6bf52634p+49 0x1.1c37937e08p+53 0x1.6345785d8a001p+56 0x1.bc16d674ec801p+59 0x1.158e460913d01p+63 0x1.5af1d78b58c41p+66 0x1.b1ae4d6e2ef51p+69 0x1.0f0cf064dd593p+73 0x1.52d02c7e14af8p+76 0x1.a784379d99db6p+79 0x1.08b2a2c280292p+83 atof: 0x1.08b2a2c280291p+83 -- 10^25: 10000000000000000905969664 0x1.08b2a2c280291p+83> On 11/07/2022, at 02:00, Antoine Fabri <antoine.fabri at gmail.com> wrote: > > Dear r-devel, > > For some numbers, the printed value is not equivalent to the input : > > options(scipen = 999) > ## GOOD > 1e24 > #> [1] 999999999999999983222784 > 1e24 == 999999999999999983222784 > #> [1] TRUE > > ## BAD > 1e25 > #> [1] 10000000000000000905969664 > 1e25 == 10000000000000000905969664 > #> [1] FALSE > > ## STILL BAD > 10000000000000000905969664 > #> [1] 10000000000000003053453312 > > ## GOOD AGAIN > 10000000000000003053453312 > #> [1] 10000000000000003053453312 > > # Additionally > 10000000000000000000000000 == 1e25 > #> [1] FALSE > > Are these bugs ? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >