>>>>> Ben Bolker <bbolker at gmail.com>
>>>>> on Sat, 5 Feb 2011 15:58:09 -0500 writes:
> A bug was recently posted to the R bug database (which
> probably would better have been posted as a query here) as
> to why this happens:
>> print(7.921,digits=2)
> [1] 8
>> print(7.92,digits=2)
> [1] 7.9
> Two things I *haven't* done to help make sense of this
> for myself are (1) writing out the binary representations
> to see if something obvious pops out about why this would
> be a breakpoint and (2) poking in the source code (I did a
> little bit of this but gave up).
> I know that confusion over rounding etc. is very common,
> and my first reaction to this sort of question is always
> that there must be some sort of confusion (either (1) in a
> FAQ 7.31-ish sort of way that floating point values have
> finite precision in the first place, or (2) a confusion
> over the difference between the value and the
> representation of the number, or (3) more subtly, about
> the differences among various choices of rounding
> conventions).
> However, in this case I am a bit stumped: I don't see
> that any of the standard confusions apply. I grepped the
> R manuals for "rounding" and didn't find anything useful
> ... I have a very strong prior that such a core part of R
> must be correct, and that therefore I (and the original
> bug reporter) must be misunderstanding something.
> Thoughts/references?
I had started to delve into this after you've posted the bug
report. It is clearly a bug(let),
caused by code that has been in R from its very
beginning, at least in the first source code I've seen in 1994.
The problem is not related to digits=2,
but using such a low number of digits shows it more
dramatically, e.g., also
> print(5.9994, digits=4)
[1] 5.999
> print(5.9994001, digits=4)
[1] 6
Interestingly, the problem seems *not* to be present for
digits = 1 (only).
I haven't found time to mathematically "analyze" it for
determining a correct solution though.
Note that fixing this bug(let) will probably (very slightly)
change a huge number of R outputs .. so there is a caveat,
but nonetheless, we must approach it.
The responsible code is the C function scientific()
in src/main/format.c
a somewhat direct R interface to its functionality is given by
R's format.info() :
> format.info(7.92, digits=2)
[1] 3 1 0
> format.info(7.921, digits=2)
[1] 1 0 0
which means that 7.92 will use 3 characters, 1 digit after the decimal
7.921 will use 1 character, 0 digits after decimal.
The crucial "buggy" part of the code is {line 180 ff} :
/* compute number of digits */
*nsig = R_print.digits;
for (j = 1; j <= *nsig; j++) {
if (fabs(alpha - floor(alpha+0.5)) < eps * alpha) {
*nsig = j;
break;
}
alpha *= 10.0;
}
where notably
if (fabs(alpha - floor(alpha+0.5)) < eps * alpha) {
is not quite the correct check.
Note that eps = 10^-2 in our case and alpha = 7.92 or 7.921
and that 8 - 7.921 = 0.079 < 7.921/10^2 is just at the border.
Looking for "all the border cases" is solving the above
analytically with '=' and setting k = ceiling(alpha) :
> p0 <- function(...) paste(..., sep="")
> k <- 6:9; names(k) <- p0("k=",k)
> d <- 1:5; names(d) <- p0("d=",d)
> xc <- outer(k, 1+10^-d, "/")
> print(xc, digits= 11)
d=1 d=2 d=3 d=4 d=5
k=6 5.4545454545 5.9405940594 5.994005994 5.99940006 5.9999400006
k=7 6.3636363636 6.9306930693 6.993006993 6.99930007 6.9999300007
k=8 7.2727272727 7.9207920792 7.992007992 7.99920008 7.9999200008
k=9 8.1818181818 8.9108910891 8.991008991 8.99910009 8.9999100009
So we see that for our case, 7.920792... is more exactly the
border case.
One change that is *not* correct is making it an absolute
instead of relative comparison, i.e. replacing the RHS
eps * alpha by eps
Namely, one important purpose of the test is to ensure that e.g.
print(3.597, digits = 3)
is printed as 3.6 and not 3.60
Now I have had -- since 1997 at least -- an R version of
scientific() for more easy experimentation.
Here's an updated version of that:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: scientific.R
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20110207/d05c4b37/attachment.pl>
-------------- next part --------------
and here some of the experimentation code:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: scientific-ex.R
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20110207/d05c4b37/attachment-0001.pl>
-------------- next part --------------
-------
Now, I'm waiting for comments/suggestions ..
Martin