Hi all, I found a discrepancy between the sum() in R and either a sum done in C or Fortran for vector of just 5 elements. The difference is very small, but this is a very small part of a much larger numerical problem in which first and second derivatives are computed numerically. This is part of a numerical method course I am teaching in which I want to compare speeds of R versus Fortran (We solve a general equilibrium problem all numerically, if you want to know). Because of this discrepancy, the Jacobian and Hessian in R versus in Fortran are quite different, which results in the Newton method producing a different solution (for a given stopping rule). Since the solution computed in Fortran is almost identical to the analytical solution, I suspect that the sum in Fortran may be more accurate (That's just a guess). Most of the time the sum produces identical results, but for some numbers, it is different. The following example, shows what happens: set.seed(12233) n <- 5 a <- runif(n,1,5) e <- runif(n, 5*(1:n),10*(1:n)) s <- runif(1, 1.2, 4) p <- runif(5, 3, 10) x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5]) u <- a^(1/s)*x^((s-1)/s) dyn.load("sumF.so") u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0 s1 <- sum(u) s2 <- .Fortran("sumf", as.double(u), as.integer(n), sf1=double(1), sf2=double(1))[3:4] s3 <- .C("sumc", as.double(u), as.integer(n), sC=double(1))[[3]] s1-s2[[1]] ## R versus compiler sum() (Fortran) [1] -7.105427e-15 s1-s2[[2]] ## R versus manual sum (Fortran [1] -7.105427e-15 s1-s3 ## R Versus manual sum in C [1] -7.105427e-15 s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran) [1] 0 s3-s2[[2]] ## Fortran versus C [1] 0 My sumf and sumc are subroutine sumf(x, n, sx1, sx2) integer i, n double precision x(n), sx1, sx2 sx1 = sum(x) sx2 = 0.0d0 do i=1,n sx2 = sx2+x(i) end do end void sumc(double *x, int *n, double *sum) { int i; double sum1 = 0.0; for (i=0; i< *n; i++) { sum1 += x[i]; } *sum = sum1; } Can that be a bug? Thanks. -- Pierre Chauss? Assistant Professor Department of Economics University of Waterloo
Pierre, It's comforting to think that the same simple summation loop implemented in R, C, and Fortran would give bit-wise identical results, but that isn't the case in practice. Floating-point results depend on lots of things: the chip used, the compiler used, the optimization flags, etc. For example, in the unlikely event that you run this on a 32-bit machine, you have the messy 80-bit internal precision used by the double precision hardware on the 32-bit platform to consider. This is enough to derail the "bit-wise equivalence train" right away. There are plenty of other such issues that can rise up and play a role. Of course, there may be something about R's double precision evaluation that is wrong, but IMHO a failure to be bit-wise identical with a computation done elsewhere isn't enough to say a bug has been turfed up. IMHO it's not unusual for computations to differ by the least significant bit or two. If that is enough to derail your computations for the Jacobians and Hessians, that is at least a teachable moment for your numerical methods class you have. BTW, are you using finite difference techniques for the derivatives or computational derivatives? The latter techniques are not so sensitive to round-off errors: I would expect them to be as accurate as symbolic derivatives. HTH, -Steve On Fri, Mar 16, 2018 at 10:58 AM, Pierre Chausse <pchausse at uwaterloo.ca> wrote:> Hi all, > > I found a discrepancy between the sum() in R and either a sum done in C or > Fortran for vector of just 5 elements. The difference is very small, but > this is a very small part of a much larger numerical problem in which first > and second derivatives are computed numerically. This is part of a > numerical method course I am teaching in which I want to compare speeds of > R versus Fortran (We solve a general equilibrium problem all numerically, > if you want to know). Because of this discrepancy, the Jacobian and Hessian > in R versus in Fortran are quite different, which results in the Newton > method producing a different solution (for a given stopping rule). Since > the solution computed in Fortran is almost identical to the analytical > solution, I suspect that the sum in Fortran may be more accurate (That's > just a guess). Most of the time the sum produces identical results, but > for some numbers, it is different. The following example, shows what > happens: > > set.seed(12233) > n <- 5 > a <- runif(n,1,5) > e <- runif(n, 5*(1:n),10*(1:n)) > s <- runif(1, 1.2, 4) > p <- runif(5, 3, 10) > x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5]) > u <- a^(1/s)*x^((s-1)/s) > dyn.load("sumF.so") > > u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0 > s1 <- sum(u) > s2 <- .Fortran("sumf", as.double(u), as.integer(n), sf1=double(1), > sf2=double(1))[3:4] > s3 <- .C("sumc", as.double(u), as.integer(n), sC=double(1))[[3]] > > s1-s2[[1]] ## R versus compiler sum() (Fortran) > > [1] -7.105427e-15 > > s1-s2[[2]] ## R versus manual sum (Fortran > > [1] -7.105427e-15 > > s1-s3 ## R Versus manual sum in C > > [1] -7.105427e-15 > > s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran) > > [1] 0 > > s3-s2[[2]] ## Fortran versus C > > [1] 0 > > My sumf and sumc are > > subroutine sumf(x, n, sx1, sx2) > integer i, n > double precision x(n), sx1, sx2 > sx1 = sum(x) > sx2 = 0.0d0 > do i=1,n > sx2 = sx2+x(i) > end do > end > > void sumc(double *x, int *n, double *sum) > { > int i; > double sum1 = 0.0; > for (i=0; i< *n; i++) { > sum1 += x[i]; > } > *sum = sum1; > } > > Can that be a bug? Thanks. > > -- > Pierre Chauss? > Assistant Professor > Department of Economics > University of Waterloo > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Steven Dirkse, Ph.D. GAMS Development Corp. office: 202.342.0180 [[alternative HTML version deleted]]
luke-tierney at uiowa.edu
2018-Mar-16 15:37 UTC
[Rd] Discrepancy: R sum() VS C or Fortran sum
Install the gmp package, run your code, and then try this: bu <- gmp::as.bigq(u) bs4 <- bu[1] + bu[2] + bu[3] + bu[4] + bu[5] s4 <- as.double(bs4) s1 - s4 ## [1] 0 s2[[2]] - s4 ## [1] 7.105427e-15 s3 - s4 ## [1] 7.105427e-15 identical(s1, s4) ## [1] TRUE `bs4` is the exact sum of the binary rationals in your `u` vector; `s4` is the closest double precision to this exact sum. Looking at the C source code for sum() will show you that it makes some extra efforts to get a more accurate sum than your simple version. Best, luke On Fri, 16 Mar 2018, Pierre Chausse wrote:> Hi all, > > I found a discrepancy between the sum() in R and either a sum done in C or > Fortran for vector of just 5 elements. The difference is very small, but this > is a very small part of a much larger numerical problem in which first and > second derivatives are computed numerically. This is part of a numerical > method course I am teaching in which I want to compare speeds of R versus > Fortran (We solve a general equilibrium problem all numerically, if you want > to know). Because of this discrepancy, the Jacobian and Hessian in R versus > in Fortran are quite different, which results in the Newton method producing > a different solution (for a given stopping rule). Since the solution computed > in Fortran is almost identical to the analytical solution, I suspect that the > sum in Fortran may be more accurate (That's just a guess). Most of the time > the sum produces identical results, but for some numbers, it is different. > The following example, shows what happens: > > set.seed(12233) > n <- 5 > a <- runif(n,1,5) > e <- runif(n, 5*(1:n),10*(1:n)) > s <- runif(1, 1.2, 4) > p <- runif(5, 3, 10) > x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5]) > u <- a^(1/s)*x^((s-1)/s) > dyn.load("sumF.so") > > u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0 > s1 <- sum(u) > s2 <- .Fortran("sumf", as.double(u), as.integer(n), sf1=double(1), > sf2=double(1))[3:4] > s3 <- .C("sumc", as.double(u), as.integer(n), sC=double(1))[[3]] > > s1-s2[[1]] ## R versus compiler sum() (Fortran) > > [1] -7.105427e-15 > > s1-s2[[2]] ## R versus manual sum (Fortran > > [1] -7.105427e-15 > > s1-s3 ## R Versus manual sum in C > > [1] -7.105427e-15 > > s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran) > > [1] 0 > > s3-s2[[2]] ## Fortran versus C > > [1] 0 > > My sumf and sumc are > > subroutine sumf(x, n, sx1, sx2) > integer i, n > double precision x(n), sx1, sx2 > sx1 = sum(x) > sx2 = 0.0d0 > do i=1,n > sx2 = sx2+x(i) > end do > end > > void sumc(double *x, int *n, double *sum) > { > int i; > double sum1 = 0.0; > for (i=0; i< *n; i++) { > sum1 += x[i]; > } > *sum = sum1; > } > > Can that be a bug? Thanks. > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
My simple functions were to compare the result with the gfortran compiler sum() function. I thought that the Fortran sum could not be less precise than R. I was wrong. I am impressed. The R sum does in fact match the result if we use the Kahan algorithm. P. I am glad to see that R sum() is more accurate than the gfortran compiler sum. On 16/03/18 11:37 AM, luke-tierney at uiowa.edu wrote:> Install the gmp package, run your code, and then try this: > > bu <- gmp::as.bigq(u) > bs4 <- bu[1] + bu[2] + bu[3] + bu[4] + bu[5] > s4 <- as.double(bs4) > s1 - s4 > ## [1] 0 > s2[[2]] - s4 > ## [1] 7.105427e-15 > s3 - s4 > ## [1] 7.105427e-15 > identical(s1, s4) > ## [1] TRUE > > `bs4` is the exact sum of the binary rationals in your `u` vector; > `s4` is the closest double precision to this exact sum. > > Looking at the C source code for sum() will show you that it makes > some extra efforts to get a more accurate sum than your simple > version. > > Best, > > luke > > On Fri, 16 Mar 2018, Pierre Chausse wrote: > >> Hi all, >> >> I found a discrepancy between the sum() in R and either a sum done in >> C or Fortran for vector of just 5 elements. The difference is very >> small, but this is a very small part of a much larger numerical >> problem in which first and second derivatives are computed >> numerically. This is part of a numerical method course I am teaching >> in which I want to compare speeds of R versus Fortran (We solve a >> general equilibrium problem all numerically, if you want to know). >> Because of this discrepancy, the Jacobian and Hessian in R versus in >> Fortran are quite different, which results in the Newton method >> producing a different solution (for a given stopping rule). Since the >> solution computed in Fortran is almost identical to the analytical >> solution, I suspect that the sum in Fortran may be more accurate >> (That's just a guess). Most of the time the sum produces identical >> results, but for some numbers, it is different. The following >> example, shows what happens: >> >> set.seed(12233) >> n <- 5 >> a <- runif(n,1,5) >> e <- runif(n, 5*(1:n),10*(1:n)) >> s <- runif(1, 1.2, 4) >> p <- runif(5, 3, 10) >> x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5]) >> u <- a^(1/s)*x^((s-1)/s) >> dyn.load("sumF.so") >> >> u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0 >> s1 <- sum(u) >> s2 <- .Fortran("sumf", as.double(u), as.integer(n), sf1=double(1), >> sf2=double(1))[3:4] >> s3 <- .C("sumc", as.double(u), as.integer(n), sC=double(1))[[3]] >> >> s1-s2[[1]] ## R versus compiler sum() (Fortran) >> >> [1] -7.105427e-15 >> >> s1-s2[[2]] ## R versus manual sum (Fortran >> >> [1] -7.105427e-15 >> >> s1-s3 ## R Versus manual sum in C >> >> [1] -7.105427e-15 >> >> s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran) >> >> [1] 0 >> >> s3-s2[[2]] ## Fortran versus C >> >> [1] 0 >> >> My sumf and sumc are >> >> subroutine sumf(x, n, sx1, sx2) >> integer i, n >> double precision x(n), sx1, sx2 >> sx1 = sum(x) >> sx2 = 0.0d0 >> do i=1,n >> sx2 = sx2+x(i) >> end do >> end >> >> void sumc(double *x, int *n, double *sum) >> { >> int i; >> double sum1 = 0.0; >> for (i=0; i< *n; i++) { >> sum1 += x[i]; >> } >> *sum = sum1; >> } >> >> Can that be a bug? Thanks. >> >> >-- Pierre Chauss? Assistant Professor Department of Economics University of Waterloo