thr3ads.net - R devel - [Rd] significant digits (PR#9682) [Jun 2008]

If this information is useful, please help other people find it:
Share via:

pmc1 at cornell.edu

2008-Jun-02 23:40 UTC

[Rd] significant digits (PR#9682)

I came to report this same bug and found it already in the trash, but
I slightly disagree with that assessment. If it's not a bug, then
perhaps it's a feature request. Comments at the end.

On Mon, May 14, 2007, Duncan Murdoch wrote:>>On 13/05/2007 8:46 PM, scott.wilkinson at csiro.au wrote:
>>
>> In the example below round() does not report to the specified number of
>> digits when the last digit to be reported is zero: Compare behaviour
for
>> 0.897575 and 0.946251. Ditto for signif(). The number of sigfigs is
>> ambiguous unless the reader knows this behaviour. Is this a bug or
>> intended behaviour? Is there a work-around?
>
> It's not a bug.  It has nothing to do with round(), it is the way R
> prints numbers by default.  If you ask to print 0.90, you'll get
>
> [1] 0.9
>
> because 0.9 and 0.90 are the same number.  If you want trailing zeros to
> print, you need to specify a format to do that, e.g.
>
> > noquote(format(0.9, nsmall=2))
> [1] 0.90
>
> The noquote stops the "" from printing.  You could also use
sprintf() or
> formatC() for more C-like format specifications.
All of those options require you to specify the number of digits after
the decimal, don't they? Unless sprintf would default to the number of
decimal places passed to it, but it doesn't:

    > sprintf("%f",signif(0.90, digits=2))
    [1] "0.900000";

it defaults to 6. Although %g behaves differently,

    > sprintf("%g",signif(0.90, digits=2))
    [1] "0.9",

this clearly still isn't the desired behavior.

To continue that vein, the same issue with rounding versus printing
occurs with vectors:

      > sapply(c(1:6),function(a){signif(c(18.423,0.90),digits=a)})
          [,1] [,2] [,3]  [,4]   [,5]   [,6]
     [1,] 20.0 18.0 18.4 18.42 18.423 18.423
     [2,]  0.9  0.9  0.9  0.90  0.900  0.900

Trying to get that and more complicated tables to print the correct
number of significant digits gets pretty hairy with sprintf(). I could
be wrong, but I would view the primary purpose of rounding to
significant digits the printed output of the number. That there
doesn't seem to be a way to do this without later having to specify
the number of decimal places would seem to render signif() as it's
written not particularly useful.

There are two solutions I can think of off the top of my head. The
first is to create a new data type of a fixed length real number but
the easier way would be to have a function that returns a string
something like this:

signif.string <- function(signum,sigdigs){
  left <- nchar(trunc(signum))
  right <- nchar(signum-trunc(signum))-2
  if (abs(signum)<1 | signum<0) {left<-left-1}
  if (right<0) {right<-0}
  if (sigdigs<left) {return(as.character(signif(signum,digits=sigdigs)))}
  else if (sigdigs==left)
{return(paste(round(signum),".",sep=""))}
  else if (sigdigs<=left+right) {return(format(signum,digits=sigdigs))}
  else
{return(sprintf(paste("%.",sigdigs-left,"f",sep=""),signum))}
}

This is just a skeleton that I think suits my needs for the moment and
might also cover the original poster's. One for production would need
to handle scientific notation and would probably want to fix the 5
round behavior on windows.

Pat Carr

version.string R version 2.7.0 (2008-04-22)

pmc1 at cornell.edu

2008-Jun-03 01:30 UTC

head link

[Rd] significant digits (PR#9682)

To reply to my own message, that function wasn't quite right. I think
this one works better:

signif.string <- function(signum,sigdigs){
  test <- abs(signum)
  left <- nchar(trunc(test))
  right <- nchar(test)-left-1
  if (test<1) {left<-left-1}
  if (right<0) {right<-0}
  if (sigdigs<left) {out<-as.character(signif(signum,digits=sigdigs))}
  else if (sigdigs==left & trunc(signum) %% 10 == 0)
{out<-paste(round(signum),".",sep="")}
  else if (sigdigs<=left+right) {out<-format(signum,digits=sigdigs)}
  else
{out<-sprintf(paste("%.",sigdigs-left,"f",sep=""),signum)}
  return(noquote(out))
}

But it should still have error checking and vector capability, yadda
yadda. Also, I forgot what year it was, so sorry, Scott, for spamming
you with something you're hopefully not still stuck on.

Pat Carr

Duncan Murdoch

2008-Jun-03 10:36 UTC

head link

[Rd] significant digits (PR#9682)

pmc1 at cornell.edu wrote:> I came to report this same bug and found it already in the trash, but
> I slightly disagree with that assessment. If it's not a bug, then
> perhaps it's a feature request. Comments at the end.
>
> On Mon, May 14, 2007, Duncan Murdoch wrote:
>   
>>> On 13/05/2007 8:46 PM, scott.wilkinson at csiro.au wrote:
>>>
>>> In the example below round() does not report to the specified
number of
>>> digits when the last digit to be reported is zero: Compare
behaviour for
>>> 0.897575 and 0.946251. Ditto for signif(). The number of sigfigs is
>>> ambiguous unless the reader knows this behaviour. Is this a bug or
>>> intended behaviour? Is there a work-around?
>>>       
>> It's not a bug.  It has nothing to do with round(), it is the way R
>> prints numbers by default.  If you ask to print 0.90, you'll get
>>
>> [1] 0.9
>>
>> because 0.9 and 0.90 are the same number.  If you want trailing zeros
to
>> print, you need to specify a format to do that, e.g.
>>
>>     
>>> noquote(format(0.9, nsmall=2))
>>>       
>> [1] 0.90
>>
>> The noquote stops the "" from printing.  You could also use
sprintf() or
>> formatC() for more C-like format specifications.
>>     
>
> All of those options require you to specify the number of digits after
> the decimal, don't they? Unless sprintf would default to the number of
> decimal places passed to it, but it doesn't:
>   
That specification doesn't make sense.  There is no "number of decimal 
places passed to it".  What sprintf() sees below is identical to what it 
would see if you called

sprintf("%f", 0.9)

because signif(0.90, digits=2) == 0.9.  Those two objects are
identical.>     > sprintf("%f",signif(0.90, digits=2))
>     [1] "0.900000";
>
> it defaults to 6. Although %g behaves differently,
>
>     > sprintf("%g",signif(0.90, digits=2))
>     [1] "0.9",
>
> this clearly still isn't the desired behavior.
>   
Maybe not what you desired, but certainly reasonable behaviour. 
> To continue that vein, the same issue with rounding versus printing
> occurs with vectors:
>
>       > sapply(c(1:6),function(a){signif(c(18.423,0.90),digits=a)})
>           [,1] [,2] [,3]  [,4]   [,5]   [,6]
>      [1,] 20.0 18.0 18.4 18.42 18.423 18.423
>      [2,]  0.9  0.9  0.9  0.90  0.900  0.900
>
> Trying to get that and more complicated tables to print the correct
> number of significant digits gets pretty hairy with sprintf(). I could
> be wrong, but I would view the primary purpose of rounding to
> significant digits the printed output of the number. That there
> doesn't seem to be a way to do this without later having to specify
> the number of decimal places would seem to render signif() as it's
> written not particularly useful.
>
> There are two solutions I can think of off the top of my head. The
> first is to create a new data type of a fixed length real number but
> the easier way would be to have a function that returns a string
> something like this:
>
> signif.string <- function(signum,sigdigs){
>   left <- nchar(trunc(signum))
>   right <- nchar(signum-trunc(signum))-2
>   if (abs(signum)<1 | signum<0) {left<-left-1}
>   if (right<0) {right<-0}
>   if (sigdigs<left)
{return(as.character(signif(signum,digits=sigdigs)))}
>   else if (sigdigs==left)
{return(paste(round(signum),".",sep=""))}
>   else if (sigdigs<=left+right) {return(format(signum,digits=sigdigs))}
>   else
{return(sprintf(paste("%.",sigdigs-left,"f",sep=""),signum))}
> }
>
>   
> This is just a skeleton that I think suits my needs for the moment and
> might also cover the original poster's. One for production would need
> to handle scientific notation and would probably want to fix the 5
> round behavior on windows.
>   
As far as I know, rounding is fine in Windows:

 > round(1:10 + 0.5)
 [1]  2  2  4  4  6  6  8  8 10 10

looks okay to me.  If you want biased rounding instead (getting 2:11 as 
output), simply use trunc(x + 0.5) instead of round(x).

Duncan Murdoch> Pat Carr
>
> version.string R version 2.7.0 (2008-04-22)
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

simon.urbanek at r-project.org

2008-Jun-03 20:40 UTC

head link

[Rd] significant digits (PR#9682)

On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
> On 6/3/2008 11:43 AM, Patrick Carr wrote:
>> On 6/3/08, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>>>
>>> because signif(0.90, digits=2) == 0.9.  Those two objects are  
>>> identical.
>> My text above that is poorly worded. They're identical internally,
>> yes. But in terms of the number of significant digits, 0.9 and 0.90
>> are different. And that matters when the number is printed, say as an
>> annotation on a graph. Passing it through sprintf() or format() later
>> requires you to specify the number of digits after the decimal, which
>> is different than the number of significant digits, and requires case
>> testing for numbers of different orders of magnitude.
>> The original complainant (and I) expected this behavior from  
>> signif(),
>> not merely rounding. As I said before, I wrote my own workaround so
>> this is somewhat academic, but I don't think we're alone.
>>> As far as I know, rounding is fine in Windows:
>>>
>>> > round(1:10 + 0.5)
>>> [1]  2  2  4  4  6  6  8  8 10 10
>>>
>> It might not be the rounding, then. (windows xp sp3)
>>   > signif(12345,digits=4)
>>   [1] 12340
>>   > signif(0.12345,digits=4)
>>   [1] 0.1235
>
> It's easy to make mistakes in this, but a little outside-of-R  
> experimentation suggests those are the right answers.  The number  
> 12345 is exactly representable, so it is exactly half-way between  
> 12340 and 12350, so 12340 is the right answer by the unbiased round- 
> to-even rule.  The number 0.12345 is not exactly representable, but  
> (I think) it is represented by something slightly closer to 0.1235  
> than to 0.1234.  So it looks as though Windows gets it right.
>
>
>> OS X (10.5.2/intel) does not have that problem.
>
> Which would seem to imply OS X gets it wrong.
This has nothing to do with OS X, you get that same answer on pretty  
much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/Sun, ...).  
Windows is the only one delivering the incorrect result here.

>  Both are supposed to be using the 64 bit floating point standard,  
> so they should both give the same answer:
Should, yes, but Windows doesn't. In fact 10000.0 is exactly  
representable and so is 1234.5 which is the correct result that all  
except Windows get. I don't have a Windows box handy, so I can't tell  
why - but if you go through fprec this is what you get on the  
platforms I tested (log10 may vary slightly but that's irrelevant here):

x = 0.123450000000000004174439
l10 = -0.908508905732048899217546, e10 = 4
pow10 = 10000.000000000000000000000000
x*pow10 = 1234.500000000000000000000000

Cheers,
Simon

>  but the actual arithmetic is being done by run-time libraries that  
> are outside our control to a large extent, and it looks as though  
> the one on the Mac is less accurate than the one on Windows.
>
>
> But (on both windows and OS X):
>>   > signif(12345.12345,digits=10)
>>   [1] 12345.12
>
> This is a different problem.  The number is correctly computed as  
> 12345.12345 (or at least a representable number quite close to  
> that), and then the default display rounds it some more.  Set  
> options(digits=19) to see it in its full glory.
>
> Duncan Murdoch
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

murdoch at stats.uwo.ca

2008-Jun-03 21:15 UTC

head link

[Rd] significant digits (PR#9682)

On 6/3/2008 4:36 PM, Simon Urbanek wrote:> On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
> 
>> On 6/3/2008 11:43 AM, Patrick Carr wrote:
>>> On 6/3/08, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>>>>
>>>> because signif(0.90, digits=2) == 0.9.  Those two objects are  
>>>> identical.
>>> My text above that is poorly worded. They're identical
internally,
>>> yes. But in terms of the number of significant digits, 0.9 and 0.90
>>> are different. And that matters when the number is printed, say as
an
>>> annotation on a graph. Passing it through sprintf() or format()
later
>>> requires you to specify the number of digits after the decimal,
which
>>> is different than the number of significant digits, and requires
case
>>> testing for numbers of different orders of magnitude.
>>> The original complainant (and I) expected this behavior from  
>>> signif(),
>>> not merely rounding. As I said before, I wrote my own workaround so
>>> this is somewhat academic, but I don't think we're alone.
>>>> As far as I know, rounding is fine in Windows:
>>>>
>>>> > round(1:10 + 0.5)
>>>> [1]  2  2  4  4  6  6  8  8 10 10
>>>>
>>> It might not be the rounding, then. (windows xp sp3)
>>>   > signif(12345,digits=4)
>>>   [1] 12340
>>>   > signif(0.12345,digits=4)
>>>   [1] 0.1235
>>
>> It's easy to make mistakes in this, but a little outside-of-R  
>> experimentation suggests those are the right answers.  The number  
>> 12345 is exactly representable, so it is exactly half-way between  
>> 12340 and 12350, so 12340 is the right answer by the unbiased round- 
>> to-even rule.  The number 0.12345 is not exactly representable, but  
>> (I think) it is represented by something slightly closer to 0.1235  
>> than to 0.1234.  So it looks as though Windows gets it right.
>>
>>
>>> OS X (10.5.2/intel) does not have that problem.
>>
>> Which would seem to imply OS X gets it wrong.
> 
> This has nothing to do with OS X, you get that same answer on pretty  
> much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/Sun, ...).  
> Windows is the only one delivering the incorrect result here.
> 
> 
>>  Both are supposed to be using the 64 bit floating point standard,  
>> so they should both give the same answer:
> 
> Should, yes, but Windows doesn't. In fact 10000.0 is exactly  
> representable and so is 1234.5 which is the correct result that all  
> except Windows get. 
I think you skipped a step.  The correct answer is either 0.1234 or 
0.1235, not something 10000 times bigger.  The first important question 
is whether 0.12345 is exactly representable, and the answer is no.  The 
second question is whether it is represented by a number bigger or 
smaller than the real number 0.12345.  If it is bigger, the answer 
should be 0.1235, and if it is smaller, the answer is 0.1234.  My 
experiments suggest it is bigger.  Yours don't look relevant.  It 
certainly isn't exactly equal to 1234.5/10000, because that number is 
not representable.  It's equal to x/2^y, for some x and y, and it's a 
pain to figure out exactly what they are.

However, I am pretty sure R is representing it (at least on Windows) as 
the binary expansion

0.000111111001101001101011010100001011000011110010011111

while the true binary expansion (using exact rational arithmetic) starts 
out

0.00011111100110100110101101010000101100001111001001111011101...

If you line those up, you'll see that the first number is bigger than 
the second.  (Ugly code to derive these is down below.)

Clearly the top representation is the correct one to that number of 
binary digits, so I think Windows got it right, and all those other 
systems didn't.  This is probably because R on Windows is using extended 
precision (64 bit mantissas) for intermediate results, and those other 
systems stick with 53 bit mantissas.

Duncan Murdoch

# Convert number to binary expansion; add the decimal point manually

x <- 0.12345
while (x != 0) {
   cat(trunc(x))
   x <- x - trunc(x)
   x <- x * 2
}

# Do the same thing in exact rational arithmetic

num <- 12345
denom <- 100000
for (i in 1:60) {
   cat(ifelse(num > 100000, "1", "0"))
   num <- num %% 100000
   num <- 2*num
}

# Manually cut and paste the results to get these:

"0.000111111001101001101011010100001011000011110010011111"
"0.00011111100110100110101101010000101100001111001001111011101"

simon.urbanek at r-project.org

2008-Jun-03 21:35 UTC

head link

[Rd] significant digits (PR#9682)

On Jun 3, 2008, at 5:12 PM, Duncan Murdoch wrote:
> On 6/3/2008 4:36 PM, Simon Urbanek wrote:
>> On Jun 3, 2008, at 2:48 PM, Duncan Murdoch wrote:
>>> On 6/3/2008 11:43 AM, Patrick Carr wrote:
>>>> On 6/3/08, Duncan Murdoch <murdoch at stats.uwo.ca>
wrote:
>>>>>
>>>>> because signif(0.90, digits=2) == 0.9.  Those two objects
are
>>>>> identical.
>>>> My text above that is poorly worded. They're identical
internally,
>>>> yes. But in terms of the number of significant digits, 0.9 and
0.90
>>>> are different. And that matters when the number is printed, say
>>>> as an
>>>> annotation on a graph. Passing it through sprintf() or format()
>>>> later
>>>> requires you to specify the number of digits after the decimal,
>>>> which
>>>> is different than the number of significant digits, and
requires
>>>> case
>>>> testing for numbers of different orders of magnitude.
>>>> The original complainant (and I) expected this behavior from   
>>>> signif(),
>>>> not merely rounding. As I said before, I wrote my own
workaround so
>>>> this is somewhat academic, but I don't think we're
alone.
>>>>> As far as I know, rounding is fine in Windows:
>>>>>
>>>>> > round(1:10 + 0.5)
>>>>> [1]  2  2  4  4  6  6  8  8 10 10
>>>>>
>>>> It might not be the rounding, then. (windows xp sp3)
>>>>  > signif(12345,digits=4)
>>>>  [1] 12340
>>>>  > signif(0.12345,digits=4)
>>>>  [1] 0.1235
>>>
>>> It's easy to make mistakes in this, but a little outside-of-R
>>> experimentation suggests those are the right answers.  The number
>>> 12345 is exactly representable, so it is exactly half-way between
>>> 12340 and 12350, so 12340 is the right answer by the unbiased  
>>> round- to-even rule.  The number 0.12345 is not exactly  
>>> representable, but  (I think) it is represented by something  
>>> slightly closer to 0.1235  than to 0.1234.  So it looks as though  
>>> Windows gets it right.
>>>
>>>
>>>> OS X (10.5.2/intel) does not have that problem.
>>>
>>> Which would seem to imply OS X gets it wrong.
>> This has nothing to do with OS X, you get that same answer on  
>> pretty  much all other platforms (Intel/Linux, MIPS/IRIX, Sparc/ 
>> Sun, ...).  Windows is the only one delivering the incorrect result  
>> here.
>>> Both are supposed to be using the 64 bit floating point standard,
>>> so they should both give the same answer:
>> Should, yes, but Windows doesn't. In fact 10000.0 is exactly   
>> representable and so is 1234.5 which is the correct result that  
>> all  except Windows get.
>
> I think you skipped a step.
I didn't - I was just pointing out that what you are trying to show is  
irrelevant. We are dealing with FP arithmetics here, so although your  
reasoning is valid algebraically, it's not in FP world. You missed the  
fact that FP operations are used to actually get the result (*10000.0,  
round and divide again) and thus those operation will influence it as  
well.

>  The correct answer is either 0.1234 or 0.1235, not something 10000  
> times bigger.  The first important question is whether 0.12345 is  
> exactly representable, and the answer is no.  The second question is  
> whether it is represented by a number bigger or smaller than the  
> real number 0.12345.  If it is bigger, the answer should be 0.1235,  
> and if it is smaller, the answer is 0.1234.
No. That was what I was trying to point out. You can see clearly from  
my post that 0.12345 is not exactly representable and that the  
representation is slightly bigger. This is, however, irrelevant,  
because the next step is to multiply that number by 10000 (see fprec  
source) and this is where your reasoning breaks down - the result is  
exact representation of 1234.5, because the imprecision gets lost in  
the operation on all platforms but Windows. The result is that Windows  
is inconsistent with others, whether that is a bug or feature I don't  
care. All I really wanted to say is that this has nothing to do with  
OS X - if anything then it's a Windows issue.

>  My experiments suggest it is bigger.
I was not claiming otherwise.

>  Yours don't look relevant.
Vice versa as it turns out.

Cheers,
Simon

> It certainly isn't exactly equal to 1234.5/10000, because that  
> number is not representable.  It's equal to x/2^y, for some x and y,  
> and it's a pain to figure out exactly what they are.
>
> However, I am pretty sure R is representing it (at least on Windows)  
> as the binary expansion
>
> 0.000111111001101001101011010100001011000011110010011111
>
> while the true binary expansion (using exact rational arithmetic)  
> starts out
>
> 0.00011111100110100110101101010000101100001111001001111011101...
>
> If you line those up, you'll see that the first number is bigger  
> than the second.  (Ugly code to derive these is down below.)
>
> Clearly the top representation is the correct one to that number of  
> binary digits, so I think Windows got it right, and all those other  
> systems didn't.  This is probably because R on Windows is using  
> extended precision (64 bit mantissas) for intermediate results, and  
> those other systems stick with 53 bit mantissas.
>
However, this means that Windows doesn't conform
> Duncan Murdoch
>
> # Convert number to binary expansion; add the decimal point manually
>
> x <- 0.12345
> while (x != 0) {
>  cat(trunc(x))
>  x <- x - trunc(x)
>  x <- x * 2
> }
>
> # Do the same thing in exact rational arithmetic
>
> num <- 12345
> denom <- 100000
> for (i in 1:60) {
>  cat(ifelse(num > 100000, "1", "0"))
>  num <- num %% 100000
>  num <- 2*num
> }
>
> # Manually cut and paste the results to get these:
>
> "0.000111111001101001101011010100001011000011110010011111"
> "0.00011111100110100110101101010000101100001111001001111011101"
>
>

Scott.Wilkinson at csiro.au

2008-Jun-13 04:10 UTC

head link

[Rd] significant digits (PR#9682)

Thanks Patrick, your function is a neat work-around to include trailing
zeroes when specifying significant digits.
It worked better with this modification to the "format" arguments: =20
  else if (sigdigs<=3Dleft+right)
{out<-format(signum,nsmall=3Dsigdigs-left)}

Agree would be nice to include this in signif()
Regards, Scott.
> -----Original Message-----
> From: patcarr at gmail.com [mailto:patcarr at gmail.com] On Behalf=20
> Of Patrick Carr
> Sent: Tuesday, 3 June 2008 11:28 AM
> To: r-bugs at r-project.org
> Cc: Wilkinson, Scott (CLW, Townsville)
> Subject: Re: significant digits (PR#9682)
>=20
> To reply to my own message, that function wasn't quite right. I think
> this one works better:
>=20
> signif.string <- function(signum,sigdigs){
>   test <- abs(signum)
>   left <- nchar(trunc(test))
>   right <- nchar(test)-left-1
>   if (test<1) {left<-left-1}
>   if (right<0) {right<-0}
>   if (sigdigs<left)
{out<-as.character(signif(signum,digits=3Dsigdigs))}
>   else if (sigdigs=3D=3Dleft & trunc(signum) %% 10 =3D=3D 0)
> {out<-paste(round(signum),".",sep=3D"")}
>   else if (sigdigs<=3Dleft+right)
{out<-format(signum,digits=3Dsigdigs)}
>   else
{out<-sprintf(paste("%.",sigdigs-left,"f",sep=3D""),signum)}
>   return(noquote(out))
> }
>=20
> But it should still have error checking and vector capability, yadda
> yadda. Also, I forgot what year it was, so sorry, Scott, for spamming
> you with something you're hopefully not still stuck on.
>=20
> Pat Carr
>=20

Apparently Analagous Threads

Search for more apparently analagous threads

R devel - Jun 2008 - significant digits (PR#9682)

[Rd] significant digits (PR#9682)

[Rd] significant digits (PR#9682)

[Rd] significant digits (PR#9682)

[Rd] significant digits (PR#9682)

[Rd] significant digits (PR#9682)

[Rd] significant digits (PR#9682)

[Rd] significant digits (PR#9682)

Apparently Analagous Threads