thr3ads.net - R devel - [Rd] code for sum function [Feb 2019]

If this information is useful, please help other people find it:
Share via:

William Dunlap

2019-Feb-19 19:08 UTC

[Rd] code for sum function

The algorithm does make a differece.  You can use Kahan's summation
algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to
reduce the error compared to the naive summation algorithm.  E.g., in R
code:

naiveSum <-
function(x) {
   s <- 0.0
   for(xi in x) s <- s + xi
   s
}
kahanSum <- function (x)
{
   s <- 0.0
   c <- 0.0 # running compensation for lost low-order bits
   for(xi in x) {
      y <- xi - c
      t <- s + y # low-order bits of y may be lost here
      c <- (t - s) - y
      s <- t
   }
   s
}
> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0)
> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)),
0)
> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)),
0)
>
> table(rSum == rNaiveSum)
FALSE  TRUE
   21     5> table(rSum == rKahanSum)
FALSE  TRUE
    3    23


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at gmail.com>
wrote:
> (I didn't see anyone else answer this, so ...)
>
> You can probably find the R code in src/main/ but I'm not sure. You are
> talking about a very simple calculation, so it seems unlike that the
> algorithm is the cause of the difference. I have done much more
> complicated things and usually get machine precision comparisons. There
> are four possibilities I can think of that could cause (small) differences.
>
> 0/ Your code is wrong, but that seems unlikely on such a simple
> calculations.
>
> 1/ You are summing a very large number of numbers, in which case the sum
> can become very large compared to numbers being added, then things can
> get a bit funny.
>
> 2/ You are using single precision in fortran rather than double. Double
> is needed for all floating point numbers you use!
>
> 3/ You have not zeroed the double precision numbers in fortran. (Some
> compilers do not do this automatically and you have to specify it.) Then
> if you accidentally put singles, like a constant 0.0 rather than a
> constant 0.0D+0, into a double you will have small junk in the lower
> precision part.
>
> (I am assuming you are talking about a sum of reals, not integer or
> complex.)
>
> HTH,
> Paul Gilbert
>
> On 2/14/19 2:08 PM, Rampal Etienne wrote:
> > Hello,
> >
> > I am trying to write FORTRAN code to do the same as some R code I
have.
> > I get (small) differences when using the sum function in R. I know
there
> > are numerical routines to improve precision, but I have not been able
to
> > figure out what algorithm R is using. Does anyone know this? Or where
> > can I find the code for the sum function?
> >
> > Regards,
> >
> > Rampal Etienne
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Ben Bolker

2019-Feb-19 20:43 UTC

head link

[Rd] code for sum function

This SO question may be of interest:

https://stackoverflow.com/questions/38589705/difference-between-rs-sum-and-armadillos-accu/

  which points out that sum() isn't doing anything fancy *except* using
extended-precision registers when available.  (Using Kahan's algorithm
does come at a computational cost ...)

On 2019-02-19 2:08 p.m., William Dunlap via R-devel
wrote:> The algorithm does make a differece.  You can use Kahan's summation
> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to
> reduce the error compared to the naive summation algorithm.  E.g., in R
> code:
> 
> naiveSum <-
> function(x) {
>    s <- 0.0
>    for(xi in x) s <- s + xi
>    s
> }
> kahanSum <- function (x)
> {
>    s <- 0.0
>    c <- 0.0 # running compensation for lost low-order bits
>    for(xi in x) {
>       y <- xi - c
>       t <- s + y # low-order bits of y may be lost here
>       c <- (t - s) - y
>       s <- t
>    }
>    s
> }
> 
>> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0)
>> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n)
naiveSum(rep(1/7,n)), 0)
>> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n)
kahanSum(rep(1/7,n)), 0)
>>
>> table(rSum == rNaiveSum)
> 
> FALSE  TRUE
>    21     5
>> table(rSum == rKahanSum)
> 
> FALSE  TRUE
>     3    23
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> 
> On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at
gmail.com> wrote:
> 
>> (I didn't see anyone else answer this, so ...)
>>
>> You can probably find the R code in src/main/ but I'm not sure. You
are
>> talking about a very simple calculation, so it seems unlike that the
>> algorithm is the cause of the difference. I have done much more
>> complicated things and usually get machine precision comparisons. There
>> are four possibilities I can think of that could cause (small)
differences.
>>
>> 0/ Your code is wrong, but that seems unlikely on such a simple
>> calculations.
>>
>> 1/ You are summing a very large number of numbers, in which case the
sum
>> can become very large compared to numbers being added, then things can
>> get a bit funny.
>>
>> 2/ You are using single precision in fortran rather than double. Double
>> is needed for all floating point numbers you use!
>>
>> 3/ You have not zeroed the double precision numbers in fortran. (Some
>> compilers do not do this automatically and you have to specify it.)
Then
>> if you accidentally put singles, like a constant 0.0 rather than a
>> constant 0.0D+0, into a double you will have small junk in the lower
>> precision part.
>>
>> (I am assuming you are talking about a sum of reals, not integer or
>> complex.)
>>
>> HTH,
>> Paul Gilbert
>>
>> On 2/14/19 2:08 PM, Rampal Etienne wrote:
>>> Hello,
>>>
>>> I am trying to write FORTRAN code to do the same as some R code I
have.
>>> I get (small) differences when using the sum function in R. I know
there
>>> are numerical routines to improve precision, but I have not been
able to
>>> figure out what algorithm R is using. Does anyone know this? Or
where
>>> can I find the code for the sum function?
>>>
>>> Regards,
>>>
>>> Rampal Etienne
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Berend Hasselman

2019-Feb-20 10:11 UTC

head link

[Rd] code for sum function

> 
> On 2019-02-19 2:08 p.m., William Dunlap via R-devel wrote:
>> The algorithm does make a differece.  You can use Kahan's summation
>> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to
>> reduce the error compared to the naive summation algorithm.  E.g., in R
>> code:
>> 
>> naiveSum <-
>> function(x) {
>>   s <- 0.0
>>   for(xi in x) s <- s + xi
>>   s
>> }
>> kahanSum <- function (x)
>> {
>>   s <- 0.0
>>   c <- 0.0 # running compensation for lost low-order bits
>>   for(xi in x) {
>>      y <- xi - c
>>      t <- s + y # low-order bits of y may be lost here
>>      c <- (t - s) - y
>>      s <- t
>>   }
>>   s
>> }
>> 
>>> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0)
>>> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n)
naiveSum(rep(1/7,n)), 0)
>>> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n)
kahanSum(rep(1/7,n)), 0)
>>> 
>>> table(rSum == rNaiveSum)
>> 
>> FALSE  TRUE
>>   21     5
>>> table(rSum == rKahanSum)
>> 
>> FALSE  TRUE
>>    3    23

If you use the vector  c(1,10^100,1,-10^100) as input then
sum, naiveSum or kahanSum will all give an incorrect answer.
All return 0 instead of 2.

From the wikipedia page we can try the pseudocode given of the modification by
Neumaier.
My R version (with a small correction to avoid cancellation?) is

neumaierSum <- function (x)
{
  s <- 0.0
  z <- 0.0 # running compensation for lost low-order bits
  for(xi in x) {
     t <- s + xi
     if( abs(s) >= abs(xi) ){
         b <- (s-t)+xi #  intermediate step needed  in R otherwise
cancellation
         z <- z+b      # If sum is bigger, low-order digits of xi are lost.
     } else {
         b <- (xi-t)+s #  intermediate step needed in R otherwise
cancellation
         z <- z+b      # else low-order digits of sum are lost
     }
     s <- t
  }
  s+z   # correction only applied once in the very end
}

testx <-  c(1,10^100,1,-10^100)
neumaierSum(testx)

gives 2 as answer.

Berend Hasselman

Rampal Etienne

2019-Feb-20 22:45 UTC

head link

[Rd] code for sum function

Dear Will,

This is exactly what I find.
My point is thus that the sum function in R is not a naive sum nor a
Kahansum (in all cases), but what algorithm is it using then?

Cheers, Rampal


On Tue, Feb 19, 2019, 19:08 William Dunlap <wdunlap at tibco.com wrote:
> The algorithm does make a differece.  You can use Kahan's summation
> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to
> reduce the error compared to the naive summation algorithm.  E.g., in R
> code:
>
> naiveSum <-
> function(x) {
>    s <- 0.0
>    for(xi in x) s <- s + xi
>    s
> }
> kahanSum <- function (x)
> {
>    s <- 0.0
>    c <- 0.0 # running compensation for lost low-order bits
>    for(xi in x) {
>       y <- xi - c
>       t <- s + y # low-order bits of y may be lost here
>       c <- (t - s) - y
>       s <- t
>    }
>    s
> }
>
> > rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0)
> > rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n)
naiveSum(rep(1/7,n)),
> 0)
> > rKahanSum <- vapply(c(1:20,10^(2:7)), function(n)
kahanSum(rep(1/7,n)),
> 0)
> >
> > table(rSum == rNaiveSum)
>
> FALSE  TRUE
>    21     5
> > table(rSum == rKahanSum)
>
> FALSE  TRUE
>     3    23
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at
gmail.com>
> wrote:
>
>> (I didn't see anyone else answer this, so ...)
>>
>> You can probably find the R code in src/main/ but I'm not sure. You
are
>> talking about a very simple calculation, so it seems unlike that the
>> algorithm is the cause of the difference. I have done much more
>> complicated things and usually get machine precision comparisons. There
>> are four possibilities I can think of that could cause (small)
>> differences.
>>
>> 0/ Your code is wrong, but that seems unlikely on such a simple
>> calculations.
>>
>> 1/ You are summing a very large number of numbers, in which case the
sum
>> can become very large compared to numbers being added, then things can
>> get a bit funny.
>>
>> 2/ You are using single precision in fortran rather than double. Double
>> is needed for all floating point numbers you use!
>>
>> 3/ You have not zeroed the double precision numbers in fortran. (Some
>> compilers do not do this automatically and you have to specify it.)
Then
>> if you accidentally put singles, like a constant 0.0 rather than a
>> constant 0.0D+0, into a double you will have small junk in the lower
>> precision part.
>>
>> (I am assuming you are talking about a sum of reals, not integer or
>> complex.)
>>
>> HTH,
>> Paul Gilbert
>>
>> On 2/14/19 2:08 PM, Rampal Etienne wrote:
>> > Hello,
>> >
>> > I am trying to write FORTRAN code to do the same as some R code I
have.
>> > I get (small) differences when using the sum function in R. I know
>> there
>> > are numerical routines to improve precision, but I have not been
able
>> to
>> > figure out what algorithm R is using. Does anyone know this? Or
where
>> > can I find the code for the sum function?
>> >
>> > Regards,
>> >
>> > Rampal Etienne
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
	[[alternative HTML version deleted]]

William Dunlap

2019-Feb-20 23:37 UTC

head link

[Rd] code for sum function

Someone said it used a possibly platform-dependent
higher-than-double-precision type.

By the way, in my example involving rep(1/3, n) I neglected to include the
most precise
way to calculate the sum: n%/%3 + (n%%3)/3.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Feb 20, 2019 at 2:45 PM Rampal Etienne <rampaletienne at
gmail.com>
wrote:
> Dear Will,
>
> This is exactly what I find.
> My point is thus that the sum function in R is not a naive sum nor a
> Kahansum (in all cases), but what algorithm is it using then?
>
> Cheers, Rampal
>
>
> On Tue, Feb 19, 2019, 19:08 William Dunlap <wdunlap at tibco.com wrote:
>
>> The algorithm does make a differece.  You can use Kahan's summation
>> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to
>> reduce the error compared to the naive summation algorithm.  E.g., in R
>> code:
>>
>> naiveSum <-
>> function(x) {
>>    s <- 0.0
>>    for(xi in x) s <- s + xi
>>    s
>> }
>> kahanSum <- function (x)
>> {
>>    s <- 0.0
>>    c <- 0.0 # running compensation for lost low-order bits
>>    for(xi in x) {
>>       y <- xi - c
>>       t <- s + y # low-order bits of y may be lost here
>>       c <- (t - s) - y
>>       s <- t
>>    }
>>    s
>> }
>>
>> > rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)),
0)
>> > rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n)
naiveSum(rep(1/7,n)),
>> 0)
>> > rKahanSum <- vapply(c(1:20,10^(2:7)), function(n)
kahanSum(rep(1/7,n)),
>> 0)
>> >
>> > table(rSum == rNaiveSum)
>>
>> FALSE  TRUE
>>    21     5
>> > table(rSum == rKahanSum)
>>
>> FALSE  TRUE
>>     3    23
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at
gmail.com>
>> wrote:
>>
>>> (I didn't see anyone else answer this, so ...)
>>>
>>> You can probably find the R code in src/main/ but I'm not sure.
You are
>>> talking about a very simple calculation, so it seems unlike that
the
>>> algorithm is the cause of the difference. I have done much more
>>> complicated things and usually get machine precision comparisons.
There
>>> are four possibilities I can think of that could cause (small)
>>> differences.
>>>
>>> 0/ Your code is wrong, but that seems unlikely on such a simple
>>> calculations.
>>>
>>> 1/ You are summing a very large number of numbers, in which case
the sum
>>> can become very large compared to numbers being added, then things
can
>>> get a bit funny.
>>>
>>> 2/ You are using single precision in fortran rather than double.
Double
>>> is needed for all floating point numbers you use!
>>>
>>> 3/ You have not zeroed the double precision numbers in fortran.
(Some
>>> compilers do not do this automatically and you have to specify it.)
Then
>>> if you accidentally put singles, like a constant 0.0 rather than a
>>> constant 0.0D+0, into a double you will have small junk in the
lower
>>> precision part.
>>>
>>> (I am assuming you are talking about a sum of reals, not integer or
>>> complex.)
>>>
>>> HTH,
>>> Paul Gilbert
>>>
>>> On 2/14/19 2:08 PM, Rampal Etienne wrote:
>>> > Hello,
>>> >
>>> > I am trying to write FORTRAN code to do the same as some R
code I
>>> have.
>>> > I get (small) differences when using the sum function in R. I
know
>>> there
>>> > are numerical routines to improve precision, but I have not
been able
>>> to
>>> > figure out what algorithm R is using. Does anyone know this?
Or where
>>> > can I find the code for the sum function?
>>> >
>>> > Regards,
>>> >
>>> > Rampal Etienne
>>> >
>>> > ______________________________________________
>>> > R-devel at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more apparently analagous threads

R devel - Feb 2019 - code for sum function

[Rd] code for sum function

[Rd] code for sum function

[Rd] code for sum function

[Rd] code for sum function

[Rd] code for sum function

Apparently Analagous Threads