The algorithm does make a differece. You can use Kahan's summation algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to reduce the error compared to the naive summation algorithm. E.g., in R code: naiveSum <- function(x) { s <- 0.0 for(xi in x) s <- s + xi s } kahanSum <- function (x) { s <- 0.0 c <- 0.0 # running compensation for lost low-order bits for(xi in x) { y <- xi - c t <- s + y # low-order bits of y may be lost here c <- (t - s) - y s <- t } s }> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0) > rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), 0) > rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), 0) > > table(rSum == rNaiveSum)FALSE TRUE 21 5> table(rSum == rKahanSum)FALSE TRUE 3 23 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at gmail.com> wrote:> (I didn't see anyone else answer this, so ...) > > You can probably find the R code in src/main/ but I'm not sure. You are > talking about a very simple calculation, so it seems unlike that the > algorithm is the cause of the difference. I have done much more > complicated things and usually get machine precision comparisons. There > are four possibilities I can think of that could cause (small) differences. > > 0/ Your code is wrong, but that seems unlikely on such a simple > calculations. > > 1/ You are summing a very large number of numbers, in which case the sum > can become very large compared to numbers being added, then things can > get a bit funny. > > 2/ You are using single precision in fortran rather than double. Double > is needed for all floating point numbers you use! > > 3/ You have not zeroed the double precision numbers in fortran. (Some > compilers do not do this automatically and you have to specify it.) Then > if you accidentally put singles, like a constant 0.0 rather than a > constant 0.0D+0, into a double you will have small junk in the lower > precision part. > > (I am assuming you are talking about a sum of reals, not integer or > complex.) > > HTH, > Paul Gilbert > > On 2/14/19 2:08 PM, Rampal Etienne wrote: > > Hello, > > > > I am trying to write FORTRAN code to do the same as some R code I have. > > I get (small) differences when using the sum function in R. I know there > > are numerical routines to improve precision, but I have not been able to > > figure out what algorithm R is using. Does anyone know this? Or where > > can I find the code for the sum function? > > > > Regards, > > > > Rampal Etienne > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
This SO question may be of interest: https://stackoverflow.com/questions/38589705/difference-between-rs-sum-and-armadillos-accu/ which points out that sum() isn't doing anything fancy *except* using extended-precision registers when available. (Using Kahan's algorithm does come at a computational cost ...) On 2019-02-19 2:08 p.m., William Dunlap via R-devel wrote:> The algorithm does make a differece. You can use Kahan's summation > algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to > reduce the error compared to the naive summation algorithm. E.g., in R > code: > > naiveSum <- > function(x) { > s <- 0.0 > for(xi in x) s <- s + xi > s > } > kahanSum <- function (x) > { > s <- 0.0 > c <- 0.0 # running compensation for lost low-order bits > for(xi in x) { > y <- xi - c > t <- s + y # low-order bits of y may be lost here > c <- (t - s) - y > s <- t > } > s > } > >> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0) >> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), 0) >> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), 0) >> >> table(rSum == rNaiveSum) > > FALSE TRUE > 21 5 >> table(rSum == rKahanSum) > > FALSE TRUE > 3 23 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at gmail.com> wrote: > >> (I didn't see anyone else answer this, so ...) >> >> You can probably find the R code in src/main/ but I'm not sure. You are >> talking about a very simple calculation, so it seems unlike that the >> algorithm is the cause of the difference. I have done much more >> complicated things and usually get machine precision comparisons. There >> are four possibilities I can think of that could cause (small) differences. >> >> 0/ Your code is wrong, but that seems unlikely on such a simple >> calculations. >> >> 1/ You are summing a very large number of numbers, in which case the sum >> can become very large compared to numbers being added, then things can >> get a bit funny. >> >> 2/ You are using single precision in fortran rather than double. Double >> is needed for all floating point numbers you use! >> >> 3/ You have not zeroed the double precision numbers in fortran. (Some >> compilers do not do this automatically and you have to specify it.) Then >> if you accidentally put singles, like a constant 0.0 rather than a >> constant 0.0D+0, into a double you will have small junk in the lower >> precision part. >> >> (I am assuming you are talking about a sum of reals, not integer or >> complex.) >> >> HTH, >> Paul Gilbert >> >> On 2/14/19 2:08 PM, Rampal Etienne wrote: >>> Hello, >>> >>> I am trying to write FORTRAN code to do the same as some R code I have. >>> I get (small) differences when using the sum function in R. I know there >>> are numerical routines to improve precision, but I have not been able to >>> figure out what algorithm R is using. Does anyone know this? Or where >>> can I find the code for the sum function? >>> >>> Regards, >>> >>> Rampal Etienne >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
> > On 2019-02-19 2:08 p.m., William Dunlap via R-devel wrote: >> The algorithm does make a differece. You can use Kahan's summation >> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to >> reduce the error compared to the naive summation algorithm. E.g., in R >> code: >> >> naiveSum <- >> function(x) { >> s <- 0.0 >> for(xi in x) s <- s + xi >> s >> } >> kahanSum <- function (x) >> { >> s <- 0.0 >> c <- 0.0 # running compensation for lost low-order bits >> for(xi in x) { >> y <- xi - c >> t <- s + y # low-order bits of y may be lost here >> c <- (t - s) - y >> s <- t >> } >> s >> } >> >>> rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0) >>> rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), 0) >>> rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), 0) >>> >>> table(rSum == rNaiveSum) >> >> FALSE TRUE >> 21 5 >>> table(rSum == rKahanSum) >> >> FALSE TRUE >> 3 23If you use the vector c(1,10^100,1,-10^100) as input then sum, naiveSum or kahanSum will all give an incorrect answer. All return 0 instead of 2. From the wikipedia page we can try the pseudocode given of the modification by Neumaier. My R version (with a small correction to avoid cancellation?) is neumaierSum <- function (x) { s <- 0.0 z <- 0.0 # running compensation for lost low-order bits for(xi in x) { t <- s + xi if( abs(s) >= abs(xi) ){ b <- (s-t)+xi # intermediate step needed in R otherwise cancellation z <- z+b # If sum is bigger, low-order digits of xi are lost. } else { b <- (xi-t)+s # intermediate step needed in R otherwise cancellation z <- z+b # else low-order digits of sum are lost } s <- t } s+z # correction only applied once in the very end } testx <- c(1,10^100,1,-10^100) neumaierSum(testx) gives 2 as answer. Berend Hasselman
Dear Will, This is exactly what I find. My point is thus that the sum function in R is not a naive sum nor a Kahansum (in all cases), but what algorithm is it using then? Cheers, Rampal On Tue, Feb 19, 2019, 19:08 William Dunlap <wdunlap at tibco.com wrote:> The algorithm does make a differece. You can use Kahan's summation > algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to > reduce the error compared to the naive summation algorithm. E.g., in R > code: > > naiveSum <- > function(x) { > s <- 0.0 > for(xi in x) s <- s + xi > s > } > kahanSum <- function (x) > { > s <- 0.0 > c <- 0.0 # running compensation for lost low-order bits > for(xi in x) { > y <- xi - c > t <- s + y # low-order bits of y may be lost here > c <- (t - s) - y > s <- t > } > s > } > > > rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0) > > rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), > 0) > > rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), > 0) > > > > table(rSum == rNaiveSum) > > FALSE TRUE > 21 5 > > table(rSum == rKahanSum) > > FALSE TRUE > 3 23 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at gmail.com> > wrote: > >> (I didn't see anyone else answer this, so ...) >> >> You can probably find the R code in src/main/ but I'm not sure. You are >> talking about a very simple calculation, so it seems unlike that the >> algorithm is the cause of the difference. I have done much more >> complicated things and usually get machine precision comparisons. There >> are four possibilities I can think of that could cause (small) >> differences. >> >> 0/ Your code is wrong, but that seems unlikely on such a simple >> calculations. >> >> 1/ You are summing a very large number of numbers, in which case the sum >> can become very large compared to numbers being added, then things can >> get a bit funny. >> >> 2/ You are using single precision in fortran rather than double. Double >> is needed for all floating point numbers you use! >> >> 3/ You have not zeroed the double precision numbers in fortran. (Some >> compilers do not do this automatically and you have to specify it.) Then >> if you accidentally put singles, like a constant 0.0 rather than a >> constant 0.0D+0, into a double you will have small junk in the lower >> precision part. >> >> (I am assuming you are talking about a sum of reals, not integer or >> complex.) >> >> HTH, >> Paul Gilbert >> >> On 2/14/19 2:08 PM, Rampal Etienne wrote: >> > Hello, >> > >> > I am trying to write FORTRAN code to do the same as some R code I have. >> > I get (small) differences when using the sum function in R. I know >> there >> > are numerical routines to improve precision, but I have not been able >> to >> > figure out what algorithm R is using. Does anyone know this? Or where >> > can I find the code for the sum function? >> > >> > Regards, >> > >> > Rampal Etienne >> > >> > ______________________________________________ >> > R-devel at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >[[alternative HTML version deleted]]
Someone said it used a possibly platform-dependent higher-than-double-precision type. By the way, in my example involving rep(1/3, n) I neglected to include the most precise way to calculate the sum: n%/%3 + (n%%3)/3. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Feb 20, 2019 at 2:45 PM Rampal Etienne <rampaletienne at gmail.com> wrote:> Dear Will, > > This is exactly what I find. > My point is thus that the sum function in R is not a naive sum nor a > Kahansum (in all cases), but what algorithm is it using then? > > Cheers, Rampal > > > On Tue, Feb 19, 2019, 19:08 William Dunlap <wdunlap at tibco.com wrote: > >> The algorithm does make a differece. You can use Kahan's summation >> algorithm (https://en.wikipedia.org/wiki/Kahan_summation_algorithm) to >> reduce the error compared to the naive summation algorithm. E.g., in R >> code: >> >> naiveSum <- >> function(x) { >> s <- 0.0 >> for(xi in x) s <- s + xi >> s >> } >> kahanSum <- function (x) >> { >> s <- 0.0 >> c <- 0.0 # running compensation for lost low-order bits >> for(xi in x) { >> y <- xi - c >> t <- s + y # low-order bits of y may be lost here >> c <- (t - s) - y >> s <- t >> } >> s >> } >> >> > rSum <- vapply(c(1:20,10^(2:7)), function(n) sum(rep(1/7,n)), 0) >> > rNaiveSum <- vapply(c(1:20,10^(2:7)), function(n) naiveSum(rep(1/7,n)), >> 0) >> > rKahanSum <- vapply(c(1:20,10^(2:7)), function(n) kahanSum(rep(1/7,n)), >> 0) >> > >> > table(rSum == rNaiveSum) >> >> FALSE TRUE >> 21 5 >> > table(rSum == rKahanSum) >> >> FALSE TRUE >> 3 23 >> >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> >> On Tue, Feb 19, 2019 at 10:36 AM Paul Gilbert <pgilbert902 at gmail.com> >> wrote: >> >>> (I didn't see anyone else answer this, so ...) >>> >>> You can probably find the R code in src/main/ but I'm not sure. You are >>> talking about a very simple calculation, so it seems unlike that the >>> algorithm is the cause of the difference. I have done much more >>> complicated things and usually get machine precision comparisons. There >>> are four possibilities I can think of that could cause (small) >>> differences. >>> >>> 0/ Your code is wrong, but that seems unlikely on such a simple >>> calculations. >>> >>> 1/ You are summing a very large number of numbers, in which case the sum >>> can become very large compared to numbers being added, then things can >>> get a bit funny. >>> >>> 2/ You are using single precision in fortran rather than double. Double >>> is needed for all floating point numbers you use! >>> >>> 3/ You have not zeroed the double precision numbers in fortran. (Some >>> compilers do not do this automatically and you have to specify it.) Then >>> if you accidentally put singles, like a constant 0.0 rather than a >>> constant 0.0D+0, into a double you will have small junk in the lower >>> precision part. >>> >>> (I am assuming you are talking about a sum of reals, not integer or >>> complex.) >>> >>> HTH, >>> Paul Gilbert >>> >>> On 2/14/19 2:08 PM, Rampal Etienne wrote: >>> > Hello, >>> > >>> > I am trying to write FORTRAN code to do the same as some R code I >>> have. >>> > I get (small) differences when using the sum function in R. I know >>> there >>> > are numerical routines to improve precision, but I have not been able >>> to >>> > figure out what algorithm R is using. Does anyone know this? Or where >>> > can I find the code for the sum function? >>> > >>> > Regards, >>> > >>> > Rampal Etienne >>> > >>> > ______________________________________________ >>> > R-devel at r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>[[alternative HTML version deleted]]