> a <- 0 ; for(i in (1:200000000)) a <- a + 1/i> b <- 0 ; for(i in (200000000:1)) b <- b + 1/i > c <- sum(1/(1:200000000)) > d <- sum(1/(200000000:1)) > order(c(a,b,c,d)) [1] 1 2 4 3 > b<c [1] TRUE > c==d [1] FALSE I'd expected b being the largest, since we sum up the smallest numbers first. Instead, c is the largest, which is sum() applied to the vector ordered with largest numbers first. Can anyone shed some light on this? What is the best way in R to compute a sum while avoiding cancellation effects? By the way, sum() in the above example is much faster than the loops, so it would be nice if we could utilize it. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 836 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100624/76d2cffb/attachment.bin>
On 24/06/2010 4:08 PM, Lasse Kliemann wrote:> > a <- 0 ; for(i in (1:200000000)) a <- a + 1/i > > b <- 0 ; for(i in (200000000:1)) b <- b + 1/i > > c <- sum(1/(1:200000000)) > > d <- sum(1/(200000000:1)) > > order(c(a,b,c,d)) > [1] 1 2 4 3 > > b<c > [1] TRUE > > c==d > [1] FALSE > > I'd expected b being the largest, since we sum up the smallest > numbers first. Instead, c is the largest, which is sum() applied > to the vector ordered with largest numbers first. > > Can anyone shed some light on this? >I don't know why you'd expect b to be larger than the others. When you're using sum(), you should expect the most accurate answer, but it won't necessarily be the biggest: some rounding errors will make the answer bigger than it should be. For example, if we only had whole number arithmetic, 0.6 + 0.6 + 0.6 might come out to 3, if each value was rounded to 1 before adding. As to whether c or d comes out better: you'd need to program it yourself if you care about getting the last bit right.> What is the best way in R to compute a sum while avoiding > cancellation effects? >Use sum(). If it's not good enough, then do it in C, accumulating in extended precision (which is what sum() does), from smallest to largest if all terms are positive. If there are mixed signs it's a lot harder to optimize, but I believe the optimal order would be something like the one that keeps each partial sum as close as possible to zero. It would rarely be practical to implement that, so summing from smallest absolute value to largest would be my recommendation.> By the way, sum() in the above example is much faster than the > loops, so it would be nice if we could utilize it. >Why do you think you can't? Duncan Murdoch> ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Jun 24, 2010, at 4:08 PM, Lasse Kliemann wrote:>> a <- 0 ; for(i in (1:200000000)) a <- a + 1/i >> b <- 0 ; for(i in (200000000:1)) b <- b + 1/i >> c <- sum(1/(1:200000000)) >> d <- sum(1/(200000000:1)) >> order(c(a,b,c,d)) > [1] 1 2 4 3 >> b<c > [1] TRUE >> c==d > [1] FALSE > > I'd expected b being the largest, since we sum up the smallest > numbers first.Can you explain? I would have assumed the floating point algorithms would incorporate some sort of sensible rounding strategy that would prevent consistent underflow.> Instead, c is the largest, which is sum() applied > to the vector ordered with largest numbers first. > > Can anyone shed some light on this?> sum(1/(200000000:1)) [1] 19.69104 > sum(1/(1:200000000)) [1] 19.69104 > b <- 0 ; for(i in (200000000:1)) b <- b + 1/i > b [1] 19.69104 > sum(1/(1:200000000)) - sum(1/(200000000:1)) [1] 7.105427e-15 > sum(1/(200000000:1)) -b [1] 1.673328e-12 > sum(1/(1:200000000)) -b [1] 1.680434e-12 > .Machine$double.eps [1] 2.220446e-16 > .Machine$double.eps ^ 0.5 [1] 1.490116e-08> > What is the best way in R to compute a sum while avoiding > cancellation effects? > > By the way, sum() in the above example is much faster than the > loops, so it would be nice if we could utilize it.Why wouldn't you? Generally differences as small as you are observing are either meaningless or unavoidable using double precision arithmetic. If you need exact arithmetic, then pick an appropriate platform. -- David.
On Thu, Jun 24, 2010 at 4:08 PM, Lasse Kliemann <lasse-list-r-help-2009 at mail.plastictree.net> wrote:> What is the best way in R to compute a sum while avoiding > cancellation effects? >See ?sum.exact in the caTools package.
On Fri, 2010-07-02 at 21:23 -0700, Roger Deangelis wrote:> > Although it does not apply to your series and is impractical, it seems to > me that the most accurate algorithm might be to add all the rational numbers > whose sum and components can be represented without error in binary first, > ie 2.5 + .5 or 1/16 + 1/16 + 1/8. > > You could also get very clever and investigate a sum that should have an > exact binary representation when the individual components do not, ie .1 + > .2 + .2 = .5 and correct the sum. > > RogerRoger I think you must read: What Every Computer Scientist Should Know About Floating-Point Arithmetic ( http://docs.sun.com/source/806-3568/ncg_goldberg.html ) I think your question and others like this question is answer in this paper -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil
Hi Bernado, In many financial applications if you convert the dollars and cents to pennies( ie $1.10 to 110) and divide by 100 at the vary end you can get maintain higher precision. This applies primarily to sums. This is similar to keeping track of the decimal fractions which have exact representations in floating point. It is good to know where the errors come from. I suppose you could also improve the sum by understanding the decimal to binary rounding algorithms. I have noticed differences in computations between Sun hardware(FPUs) and Intel(FPUs). When presented the same set of operations, it appears that the floating point hardware do not do the operations in the same order. Consider sprintf("%a", sum(rep(1.1,100))) # overestimates the sum "0x1.b800000000001p+6" "0x1.b800000000001p+6" sprintf("%a", sum(rep(11,100))/10) # this gives the correct answer "0x1.b8p+6" 110 = 10 1110 = 32 + 8 + 4 + 2 - "0x1.b8p+6" - tricky due to 53 bit and little endian (I think this is right) "0x1.b8p+6" note cmp <- ifelse (sum(rep(1.1,100))==sum(rep(11,100))/10, "equal", "unequal") [1] "unequal" cmp <- ifelse (sum(rep(1.1,100))>sum(rep(11,100))/10, "greater than", "less than orequal") [1] "greater than" -- View this message in context: http://r.789695.n4.nabble.com/Best-way-to-compute-a-sum-tp2267566p2277489.html Sent from the R help mailing list archive at Nabble.com.