thr3ads.net - R help - [R] Best way to compute a sum [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Lasse Kliemann

2010-Jun-24 20:08 UTC

[R] Best way to compute a sum

> a <- 0 ; for(i in (1:200000000)) a <- a + 1/i  > b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
  > c <- sum(1/(1:200000000))
  > d <- sum(1/(200000000:1))
  > order(c(a,b,c,d))
  [1] 1 2 4 3
  > b<c
  [1] TRUE
  > c==d
  [1] FALSE

I'd expected b being the largest, since we sum up the smallest 
numbers first. Instead, c is the largest, which is sum() applied 
to the vector ordered with largest numbers first.

Can anyone shed some light on this?

What is the best way in R to compute a sum while avoiding 
cancellation effects?

By the way, sum() in the above example is much faster than the 
loops, so it would be nice if we could utilize it.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20100624/76d2cffb/attachment.bin>

Duncan Murdoch

2010-Jun-24 20:26 UTC

head link

[R] Best way to compute a sum

On 24/06/2010 4:08 PM, Lasse Kliemann wrote:>   > a <- 0 ; for(i in (1:200000000)) a <- a + 1/i
>   > b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
>   > c <- sum(1/(1:200000000))
>   > d <- sum(1/(200000000:1))
>   > order(c(a,b,c,d))
>   [1] 1 2 4 3
>   > b<c
>   [1] TRUE
>   > c==d
>   [1] FALSE
>
> I'd expected b being the largest, since we sum up the smallest 
> numbers first. Instead, c is the largest, which is sum() applied 
> to the vector ordered with largest numbers first.
>
> Can anyone shed some light on this?
>   
I don't know why you'd expect b to be larger than the others.  When 
you're using sum(), you should expect the most accurate answer, but it 
won't necessarily be the biggest:  some rounding errors will make the 
answer bigger than it should be.  For example, if we only had whole 
number arithmetic, 0.6 + 0.6 + 0.6 might come out to 3, if each value 
was rounded to 1 before adding.

As to whether c or d comes out better:  you'd need to program it 
yourself if you care about getting the last bit right.> What is the best way in R to compute a sum while avoiding 
> cancellation effects?
>   Use sum().  If it's not good enough, then do it in C, accumulating in 
extended precision (which is what sum() does), from smallest to largest 
if all terms are positive.  If there are mixed signs it's a lot harder 
to optimize, but I believe the optimal order would be something like the 
one that keeps each partial sum as close as possible to zero.  It would 
rarely be practical to implement that, so summing from smallest absolute 
value to largest would be my recommendation.> By the way, sum() in the above example is much faster than the 
> loops, so it would be nice if we could utilize it.
>   
Why do you think you can't?

Duncan Murdoch
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

David Winsemius

2010-Jun-24 20:49 UTC

head link

[R] Best way to compute a sum

On Jun 24, 2010, at 4:08 PM, Lasse Kliemann wrote:
>> a <- 0 ; for(i in (1:200000000)) a <- a + 1/i
>> b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
>> c <- sum(1/(1:200000000))
>> d <- sum(1/(200000000:1))
>> order(c(a,b,c,d))
>  [1] 1 2 4 3
>> b<c
>  [1] TRUE
>> c==d
>  [1] FALSE
>
> I'd expected b being the largest, since we sum up the smallest
> numbers first.
Can you explain? I would have assumed the floating point algorithms  
would incorporate some sort of sensible rounding strategy that would  
prevent consistent underflow.
> Instead, c is the largest, which is sum() applied
> to the vector ordered with largest numbers first.
>
> Can anyone shed some light on this? > sum(1/(200000000:1))
[1] 19.69104
 > sum(1/(1:200000000))
[1] 19.69104
 > b <- 0 ; for(i in (200000000:1)) b <- b + 1/i
 > b
[1] 19.69104
 > sum(1/(1:200000000)) - sum(1/(200000000:1))
[1] 7.105427e-15
 > sum(1/(200000000:1)) -b
[1] 1.673328e-12
 > sum(1/(1:200000000)) -b
[1] 1.680434e-12
 > .Machine$double.eps
[1] 2.220446e-16
 > .Machine$double.eps ^ 0.5
[1] 1.490116e-08
>
> What is the best way in R to compute a sum while avoiding
> cancellation effects?
>
> By the way, sum() in the above example is much faster than the
> loops, so it would be nice if we could utilize it.
Why wouldn't you? Generally differences as small as you are observing  
are either meaningless or unavoidable using double precision  
arithmetic. If you need exact arithmetic, then pick an appropriate  
platform.

-- 
David.

Gabor Grothendieck

2010-Jun-24 21:08 UTC

head link

[R] Best way to compute a sum

On Thu, Jun 24, 2010 at 4:08 PM, Lasse Kliemann
<lasse-list-r-help-2009 at mail.plastictree.net>
wrote:> What is the best way in R to compute a sum while avoiding
> cancellation effects?
>
See ?sum.exact in the caTools package.

Bernardo Rangel Tura

2010-Jul-03 20:34 UTC

head link

[R] Best way to compute a sum

On Fri, 2010-07-02 at 21:23 -0700, Roger Deangelis
wrote:> 
>   Although it does not apply to your series and is impractical,  it seems
to
> me that the most accurate algorithm might be to add all the rational
numbers
> whose sum and components can be represented  without error in binary first,
> ie 2.5 + .5 or 1/16 + 1/16 + 1/8.
> 
>   You could also get very clever and investigate a sum that should have an
> exact binary representation when the individual components do not, ie .1 +
> .2 + .2 = .5 and correct the sum.
> 
> Roger
Roger I think you must read: What Every Computer Scientist Should Know
About Floating-Point Arithmetic
( http://docs.sun.com/source/806-3568/ncg_goldberg.html )

I think your question and others like this question is answer in this
paper 
-- 
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil

Roger Deangelis

2010-Jul-04 00:27 UTC

head link

[R] Best way to compute a sum

Hi Bernado,

In many financial applications if you convert the dollars and cents to
pennies( ie $1.10 to 110) and divide by 100 at the vary end you can get
maintain higher precision. This applies primarily to sums. This is similar
to keeping track of the decimal fractions which have exact representations
in floating point. It is good to know where the errors come from.

I suppose you could also improve the sum by understanding the decimal to
binary rounding algorithms. I have noticed differences in computations
between Sun hardware(FPUs) and Intel(FPUs). When presented the same set of
operations, it appears that the floating point hardware do not do the
operations in the same order.  

Consider

   sprintf("%a", sum(rep(1.1,100)))     # overestimates the sum
"0x1.b800000000001p+6"
  "0x1.b800000000001p+6"

  sprintf("%a", sum(rep(11,100))/10)   # this gives the correct answer
"0x1.b8p+6"
  110 = 10 1110 = 32 + 8 + 4 + 2 - "0x1.b8p+6" - tricky due to 53 bit
and
little endian (I think this is right)
  "0x1.b8p+6"

  note
   cmp <- ifelse (sum(rep(1.1,100))==sum(rep(11,100))/10, "equal",
"unequal")
 
[1] "unequal"

   cmp <- ifelse (sum(rep(1.1,100))>sum(rep(11,100))/10, "greater
than",
"less than orequal")
  
[1] "greater than"
-- 
View this message in context:
http://r.789695.n4.nabble.com/Best-way-to-compute-a-sum-tp2267566p2277489.html
Sent from the R help mailing list archive at Nabble.com.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Jun 2010 - Best way to compute a sum

[R] Best way to compute a sum

[R] Best way to compute a sum

[R] Best way to compute a sum

[R] Best way to compute a sum

[R] Best way to compute a sum

[R] Best way to compute a sum

Reasonably Related Threads