The algorithm does make a differece. You can use Kahan's summation
algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to
reduce the error compared to the naive summation algorithm. E.g., in R
code:
naiveSum <-
function(x) {
s <- 0.0
for(xi in x) s <- s + xi
s
}
kahanSum <- function (x)
{
s <- 0.0
c <- 0.0 # running compensation for lost low-order bits
for(xi in x) {
y <- xi - c
t <- s + y # low-order bits of y may be lost here
c <- (t - s) - y
s <- t
}
s
}
> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0)
> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)),
0)
> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)),
0)
>
> table(rSum == rNaiveSum)
FALSE TRUE
21 5> table(rSum == rKahanSum)
FALSE TRUE
3 23
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at gmail.com>
wrote:
> (I didn't see anyone else answer this, so ...)
>
> You can probably find the R code in src/main/ but I'm not sure. You are
> talking about a very simple calculation, so it seems unlike that the
> algorithm is the cause of the difference. I have done much more
> complicated things and usually get machine precision comparisons. There
> are four possibilities I can think of that could cause (small) differences.
>
> 0/ Your code is wrong, but that seems unlikely on such a simple
> calculations.
>
> 1/ You are summing a very large number of numbers, in which case the sum
> can become very large compared to numbers being added, then things can
> get a bit funny.
>
> 2/ You are using single precision in fortran rather than double. Double
> is needed for all floating point numbers you use!
>
> 3/ You have not zeroed the double precision numbers in fortran. (Some
> compilers do not do this automatically and you have to specify it.) Then
> if you accidentally put singles, like a constant 0.0 rather than a
> constant 0.0D+0, into a double you will have small junk in the lower
> precision part.
>
> (I am assuming you are talking about a sum of reals, not integer or
> complex.)
>
> HTH,
> Paul Gilbert
>
> On 2/14/19 2:08 PM, Rampal Etienne wrote:
> > Hello,
> >
> > I am trying to write FORTRAN code to do the same as some R code I
have.
> > I get (small) differences when using the sum function in R. I know
there
> > are numerical routines to improve precision, but I have not been able
to
> > figure out what algorithm R is using. Does anyone know this? Or where
> > can I find the code for the sum function?
> >
> > Regards,
> >
> > Rampal Etienne
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
This SO question may be of interest: https://stackoverflow.com/questions/38589705/difference-between-rs-sum-and-armadillos-accu/ which points out that sum() isn't doing anything fancy *except* using extended-precision registers when available. (Using Kahan's algorithm does come at a computational cost ...) On 2019-02-19 2:08 p.m., William Dunlap via R-devel wrote:> The algorithm does make a differece. You can use Kahan's summation > algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to > reduce the error compared to the naive summation algorithm. E.g., in R > code: > > naiveSum <- > function(x) { > s <- 0.0 > for(xi in x) s <- s + xi > s > } > kahanSum <- function (x) > { > s <- 0.0 > c <- 0.0 # running compensation for lost low-order bits > for(xi in x) { > y <- xi - c > t <- s + y # low-order bits of y may be lost here > c <- (t - s) - y > s <- t > } > s > } > >> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0) >> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), 0) >> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), 0) >> >> table(rSum == rNaiveSum) > > FALSE TRUE > 21 5 >> table(rSum == rKahanSum) > > FALSE TRUE > 3 23 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at gmail.com> wrote: > >> (I didn't see anyone else answer this, so ...) >> >> You can probably find the R code in src/main/ but I'm not sure. You are >> talking about a very simple calculation, so it seems unlike that the >> algorithm is the cause of the difference. I have done much more >> complicated things and usually get machine precision comparisons. There >> are four possibilities I can think of that could cause (small) differences. >> >> 0/ Your code is wrong, but that seems unlikely on such a simple >> calculations. >> >> 1/ You are summing a very large number of numbers, in which case the sum >> can become very large compared to numbers being added, then things can >> get a bit funny. >> >> 2/ You are using single precision in fortran rather than double. Double >> is needed for all floating point numbers you use! >> >> 3/ You have not zeroed the double precision numbers in fortran. (Some >> compilers do not do this automatically and you have to specify it.) Then >> if you accidentally put singles, like a constant 0.0 rather than a >> constant 0.0D+0, into a double you will have small junk in the lower >> precision part. >> >> (I am assuming you are talking about a sum of reals, not integer or >> complex.) >> >> HTH, >> Paul Gilbert >> >> On 2/14/19 2:08 PM, Rampal Etienne wrote: >>> Hello, >>> >>> I am trying to write FORTRAN code to do the same as some R code I have. >>> I get (small) differences when using the sum function in R. I know there >>> are numerical routines to improve precision, but I have not been able to >>> figure out what algorithm R is using. Does anyone know this? Or where >>> can I find the code for the sum function? >>> >>> Regards, >>> >>> Rampal Etienne >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
> > On 2019-02-19 2:08 p.m., William Dunlap via R-devel wrote: >> The algorithm does make a differece. You can use Kahan's summation >> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to >> reduce the error compared to the naive summation algorithm. E.g., in R >> code: >> >> naiveSum <- >> function(x) { >> s <- 0.0 >> for(xi in x) s <- s + xi >> s >> } >> kahanSum <- function (x) >> { >> s <- 0.0 >> c <- 0.0 # running compensation for lost low-order bits >> for(xi in x) { >> y <- xi - c >> t <- s + y # low-order bits of y may be lost here >> c <- (t - s) - y >> s <- t >> } >> s >> } >> >>> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0) >>> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), 0) >>> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), 0) >>> >>> table(rSum == rNaiveSum) >> >> FALSE TRUE >> 21 5 >>> table(rSum == rKahanSum) >> >> FALSE TRUE >> 3 23If you use the vector c(1,10^100,1,-10^100) as input then sum, naiveSum or kahanSum will all give an incorrect answer. All return 0 instead of 2. From the wikipedia page we can try the pseudocode given of the modification by Neumaier. My R version (with a small correction to avoid cancellation?) is neumaierSum <- function (x) { s <- 0.0 z <- 0.0 # running compensation for lost low-order bits for(xi in x) { t <- s + xi if( abs(s) >= abs(xi) ){ b <- (s-t)+xi # intermediate step needed in R otherwise cancellation z <- z+b # If sum is bigger, low-order digits of xi are lost. } else { b <- (xi-t)+s # intermediate step needed in R otherwise cancellation z <- z+b # else low-order digits of sum are lost } s <- t } s+z # correction only applied once in the very end } testx <- c(1,10^100,1,-10^100) neumaierSum(testx) gives 2 as answer. Berend Hasselman
Dear Will, This is exactly what I find. My point is thus that the sum function in R is not a naive sum nor a Kahansum (in all cases), but what algorithm is it using then? Cheers, Rampal On Tue, Feb 19, 2019, 19:08 William Dunlap <wdunlap at tibco.com wrote:> The algorithm does make a differece. You can use Kahan's summation > algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to > reduce the error compared to the naive summation algorithm. E.g., in R > code: > > naiveSum <- > function(x) { > s <- 0.0 > for(xi in x) s <- s + xi > s > } > kahanSum <- function (x) > { > s <- 0.0 > c <- 0.0 # running compensation for lost low-order bits > for(xi in x) { > y <- xi - c > t <- s + y # low-order bits of y may be lost here > c <- (t - s) - y > s <- t > } > s > } > > > rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0) > > rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), > 0) > > rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), > 0) > > > > table(rSum == rNaiveSum) > > FALSE TRUE > 21 5 > > table(rSum == rKahanSum) > > FALSE TRUE > 3 23 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at gmail.com> > wrote: > >> (I didn't see anyone else answer this, so ...) >> >> You can probably find the R code in src/main/ but I'm not sure. You are >> talking about a very simple calculation, so it seems unlike that the >> algorithm is the cause of the difference. I have done much more >> complicated things and usually get machine precision comparisons. There >> are four possibilities I can think of that could cause (small) >> differences. >> >> 0/ Your code is wrong, but that seems unlikely on such a simple >> calculations. >> >> 1/ You are summing a very large number of numbers, in which case the sum >> can become very large compared to numbers being added, then things can >> get a bit funny. >> >> 2/ You are using single precision in fortran rather than double. Double >> is needed for all floating point numbers you use! >> >> 3/ You have not zeroed the double precision numbers in fortran. (Some >> compilers do not do this automatically and you have to specify it.) Then >> if you accidentally put singles, like a constant 0.0 rather than a >> constant 0.0D+0, into a double you will have small junk in the lower >> precision part. >> >> (I am assuming you are talking about a sum of reals, not integer or >> complex.) >> >> HTH, >> Paul Gilbert >> >> On 2/14/19 2:08 PM, Rampal Etienne wrote: >> > Hello, >> > >> > I am trying to write FORTRAN code to do the same as some R code I have. >> > I get (small) differences when using the sum function in R. I know >> there >> > are numerical routines to improve precision, but I have not been able >> to >> > figure out what algorithm R is using. Does anyone know this? Or where >> > can I find the code for the sum function? >> > >> > Regards, >> > >> > Rampal Etienne >> > >> > ______________________________________________ >> > R-devel at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >[[alternative HTML version deleted]]
Someone said it used a possibly platform-dependent higher-than-double-precision type. By the way, in my example involving rep(1/3, n) I neglected to include the most precise way to calculate the sum: n%/%3 + (n%%3)/3. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Feb 20, 2019 at 2:45 PM Rampal Etienne <rampaletienne at gmail.com> wrote:> Dear Will, > > This is exactly what I find. > My point is thus that the sum function in R is not a naive sum nor a > Kahansum (in all cases), but what algorithm is it using then? > > Cheers, Rampal > > > On Tue, Feb 19, 2019, 19:08 William Dunlap <wdunlap at tibco.com wrote: > >> The algorithm does make a differece. You can use Kahan's summation >> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to >> reduce the error compared to the naive summation algorithm. E.g., in R >> code: >> >> naiveSum <- >> function(x) { >> s <- 0.0 >> for(xi in x) s <- s + xi >> s >> } >> kahanSum <- function (x) >> { >> s <- 0.0 >> c <- 0.0 # running compensation for lost low-order bits >> for(xi in x) { >> y <- xi - c >> t <- s + y # low-order bits of y may be lost here >> c <- (t - s) - y >> s <- t >> } >> s >> } >> >> > rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0) >> > rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), >> 0) >> > rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), >> 0) >> > >> > table(rSum == rNaiveSum) >> >> FALSE TRUE >> 21 5 >> > table(rSum == rKahanSum) >> >> FALSE TRUE >> 3 23 >> >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> >> On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at gmail.com> >> wrote: >> >>> (I didn't see anyone else answer this, so ...) >>> >>> You can probably find the R code in src/main/ but I'm not sure. You are >>> talking about a very simple calculation, so it seems unlike that the >>> algorithm is the cause of the difference. I have done much more >>> complicated things and usually get machine precision comparisons. There >>> are four possibilities I can think of that could cause (small) >>> differences. >>> >>> 0/ Your code is wrong, but that seems unlikely on such a simple >>> calculations. >>> >>> 1/ You are summing a very large number of numbers, in which case the sum >>> can become very large compared to numbers being added, then things can >>> get a bit funny. >>> >>> 2/ You are using single precision in fortran rather than double. Double >>> is needed for all floating point numbers you use! >>> >>> 3/ You have not zeroed the double precision numbers in fortran. (Some >>> compilers do not do this automatically and you have to specify it.) Then >>> if you accidentally put singles, like a constant 0.0 rather than a >>> constant 0.0D+0, into a double you will have small junk in the lower >>> precision part. >>> >>> (I am assuming you are talking about a sum of reals, not integer or >>> complex.) >>> >>> HTH, >>> Paul Gilbert >>> >>> On 2/14/19 2:08 PM, Rampal Etienne wrote: >>> > Hello, >>> > >>> > I am trying to write FORTRAN code to do the same as some R code I >>> have. >>> > I get (small) differences when using the sum function in R. I know >>> there >>> > are numerical routines to improve precision, but I have not been able >>> to >>> > figure out what algorithm R is using. Does anyone know this? Or where >>> > can I find the code for the sum function? >>> > >>> > Regards, >>> > >>> > Rampal Etienne >>> > >>> > ______________________________________________ >>> > R-devel at r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>[[alternative HTML version deleted]]