thr3ads.net - R devel - [Rd] Discrepancy: R sum() VS C or Fortran sum [Mar 2018]

If this information is useful, please help other people find it:
Share via:

Pierre Chausse

2018-Mar-16 14:58 UTC

[Rd] Discrepancy: R sum() VS C or Fortran sum

Hi all,

I found a discrepancy between the sum() in R and either a sum done in C 
or Fortran for vector of just 5 elements. The difference is very small, 
but this is a very small part of a much larger numerical problem in 
which first and second derivatives are computed numerically. This is 
part of a numerical method course I am teaching in which I want to 
compare speeds of R versus Fortran (We solve a general equilibrium 
problem all numerically, if you want to know). Because of this 
discrepancy, the Jacobian and Hessian in R versus in Fortran are quite 
different, which results in the Newton method producing a different 
solution (for a given stopping rule). Since the solution computed in 
Fortran is almost identical to the analytical solution, I suspect that 
the sum in Fortran may be more accurate (That's just a guess).  Most of 
the time the sum produces identical results, but for some numbers, it is 
different. The following example, shows what happens:

set.seed(12233)
n <- 5
a <- runif(n,1,5)
e <- runif(n, 5*(1:n),10*(1:n))
s <- runif(1, 1.2, 4)
p <- runif(5, 3, 10)
x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5])
u <- a^(1/s)*x^((s-1)/s)
dyn.load("sumF.so")

u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0
s1 <- sum(u)
s2 <- .Fortran("sumf", as.double(u), as.integer(n), sf1=double(1),
                sf2=double(1))[3:4]
s3 <- .C("sumc", as.double(u), as.integer(n), sC=double(1))[[3]]

s1-s2[[1]] ## R versus compiler sum() (Fortran)

[1] -7.105427e-15

s1-s2[[2]] ## R versus manual sum (Fortran

[1] -7.105427e-15

s1-s3 ## R Versus manual sum in C

[1] -7.105427e-15

s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran)

[1] 0

s3-s2[[2]] ## Fortran versus C

[1] 0

My sumf and sumc are

       subroutine sumf(x, n, sx1, sx2)
       integer i, n
       double precision x(n), sx1, sx2
       sx1 = sum(x)
       sx2 = 0.0d0
       do i=1,n
          sx2 = sx2+x(i)
       end do
       end

void sumc(double *x, int *n, double *sum)
{
   int i;
   double sum1 = 0.0;
   for (i=0; i< *n; i++) {
     sum1 += x[i];
   }
   *sum = sum1;
}

Can that be a bug?  Thanks.

-- 
Pierre Chauss?
Assistant Professor
Department of Economics
University of Waterloo

Steven Dirkse

2018-Mar-16 15:29 UTC

head link

[Rd] Discrepancy: R sum() VS C or Fortran sum

Pierre,

It's comforting to think that the same simple summation loop implemented in
R, C, and Fortran would give bit-wise identical results, but that isn't the
case in practice.  Floating-point results depend on lots of things: the
chip used, the compiler used, the optimization flags, etc.  For example, in
the unlikely event that you run this on a 32-bit machine, you have the
messy 80-bit internal precision used by the double precision hardware on
the 32-bit platform to consider.  This is enough to derail the "bit-wise
equivalence train" right away.  There are plenty of other such issues that
can rise up and play a role.

Of course, there may be something about R's double precision evaluation
that is wrong, but IMHO a failure to be bit-wise identical with a
computation done elsewhere isn't enough to say a bug has been turfed up.
IMHO it's not unusual for computations to differ by the least significant
bit or two.  If that is enough to derail your computations for the
Jacobians and Hessians, that is at least a teachable moment for your
numerical methods class you have.

BTW, are you using finite difference techniques for the derivatives or
computational derivatives?  The latter techniques are not so sensitive to
round-off errors: I would expect them to be as accurate as symbolic
derivatives.

HTH,

-Steve

On Fri, Mar 16, 2018 at 10:58 AM, Pierre Chausse <pchausse at
uwaterloo.ca>
wrote:
> Hi all,
>
> I found a discrepancy between the sum() in R and either a sum done in C or
> Fortran for vector of just 5 elements. The difference is very small, but
> this is a very small part of a much larger numerical problem in which first
> and second derivatives are computed numerically. This is part of a
> numerical method course I am teaching in which I want to compare speeds of
> R versus Fortran (We solve a general equilibrium problem all numerically,
> if you want to know). Because of this discrepancy, the Jacobian and Hessian
> in R versus in Fortran are quite different, which results in the Newton
> method producing a different solution (for a given stopping rule). Since
> the solution computed in Fortran is almost identical to the analytical
> solution, I suspect that the sum in Fortran may be more accurate
(That's
> just a guess).  Most of the time the sum produces identical results, but
> for some numbers, it is different. The following example, shows what
> happens:
>
> set.seed(12233)
> n <- 5
> a <- runif(n,1,5)
> e <- runif(n, 5*(1:n),10*(1:n))
> s <- runif(1, 1.2, 4)
> p <- runif(5, 3, 10)
> x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5])
> u <- a^(1/s)*x^((s-1)/s)
> dyn.load("sumF.so")
>
> u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0
> s1 <- sum(u)
> s2 <- .Fortran("sumf", as.double(u), as.integer(n),
sf1=double(1),
>                sf2=double(1))[3:4]
> s3 <- .C("sumc", as.double(u), as.integer(n),
sC=double(1))[[3]]
>
> s1-s2[[1]] ## R versus compiler sum() (Fortran)
>
> [1] -7.105427e-15
>
> s1-s2[[2]] ## R versus manual sum (Fortran
>
> [1] -7.105427e-15
>
> s1-s3 ## R Versus manual sum in C
>
> [1] -7.105427e-15
>
> s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran)
>
> [1] 0
>
> s3-s2[[2]] ## Fortran versus C
>
> [1] 0
>
> My sumf and sumc are
>
>       subroutine sumf(x, n, sx1, sx2)
>       integer i, n
>       double precision x(n), sx1, sx2
>       sx1 = sum(x)
>       sx2 = 0.0d0
>       do i=1,n
>          sx2 = sx2+x(i)
>       end do
>       end
>
> void sumc(double *x, int *n, double *sum)
> {
>   int i;
>   double sum1 = 0.0;
>   for (i=0; i< *n; i++) {
>     sum1 += x[i];
>   }
>   *sum = sum1;
> }
>
> Can that be a bug?  Thanks.
>
> --
> Pierre Chauss?
> Assistant Professor
> Department of Economics
> University of Waterloo
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Steven Dirkse, Ph.D.
GAMS Development Corp.
office: 202.342.0180

	[[alternative HTML version deleted]]

luke-tierney at uiowa.edu

2018-Mar-16 15:37 UTC

head link

[Rd] Discrepancy: R sum() VS C or Fortran sum

Install the gmp package, run your code, and then try this:

bu <- gmp::as.bigq(u)
bs4 <- bu[1] + bu[2] + bu[3] + bu[4] + bu[5]
s4 <- as.double(bs4)
s1 - s4
##  [1] 0
s2[[2]] - s4
##  [1] 7.105427e-15
s3 - s4
##  [1] 7.105427e-15
identical(s1, s4)
##  [1] TRUE

`bs4` is the exact sum of the binary rationals in your `u` vector;
`s4` is the closest double precision to this exact sum.

Looking at the C source code for sum() will show you that it makes
some extra efforts to get a more accurate sum than your simple
version.

Best,

luke

On Fri, 16 Mar 2018, Pierre Chausse wrote:
> Hi all,
>
> I found a discrepancy between the sum() in R and either a sum done in C or 
> Fortran for vector of just 5 elements. The difference is very small, but
this
> is a very small part of a much larger numerical problem in which first and 
> second derivatives are computed numerically. This is part of a numerical 
> method course I am teaching in which I want to compare speeds of R versus 
> Fortran (We solve a general equilibrium problem all numerically, if you
want
> to know). Because of this discrepancy, the Jacobian and Hessian in R versus
> in Fortran are quite different, which results in the Newton method
producing
> a different solution (for a given stopping rule). Since the solution
computed
> in Fortran is almost identical to the analytical solution, I suspect that
the
> sum in Fortran may be more accurate (That's just a guess).  Most of the
time
> the sum produces identical results, but for some numbers, it is different. 
> The following example, shows what happens:
>
> set.seed(12233)
> n <- 5
> a <- runif(n,1,5)
> e <- runif(n, 5*(1:n),10*(1:n))
> s <- runif(1, 1.2, 4)
> p <- runif(5, 3, 10)
> x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5])
> u <- a^(1/s)*x^((s-1)/s)
> dyn.load("sumF.so")
>
> u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0
> s1 <- sum(u)
> s2 <- .Fortran("sumf", as.double(u), as.integer(n),
sf1=double(1),
>               sf2=double(1))[3:4]
> s3 <- .C("sumc", as.double(u), as.integer(n),
sC=double(1))[[3]]
>
> s1-s2[[1]] ## R versus compiler sum() (Fortran)
>
> [1] -7.105427e-15
>
> s1-s2[[2]] ## R versus manual sum (Fortran
>
> [1] -7.105427e-15
>
> s1-s3 ## R Versus manual sum in C
>
> [1] -7.105427e-15
>
> s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran)
>
> [1] 0
>
> s3-s2[[2]] ## Fortran versus C
>
> [1] 0
>
> My sumf and sumc are
>
>      subroutine sumf(x, n, sx1, sx2)
>      integer i, n
>      double precision x(n), sx1, sx2
>      sx1 = sum(x)
>      sx2 = 0.0d0
>      do i=1,n
>         sx2 = sx2+x(i)
>      end do
>      end
>
> void sumc(double *x, int *n, double *sum)
> {
>  int i;
>  double sum1 = 0.0;
>  for (i=0; i< *n; i++) {
>    sum1 += x[i];
>  }
>  *sum = sum1;
> }
>
> Can that be a bug?  Thanks.
>
>
-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Pierre Chausse

2018-Mar-16 17:08 UTC

head link

[Rd] Discrepancy: R sum() VS C or Fortran sum

My simple functions were to compare the result with the gfortran 
compiler sum() function.  I thought that the Fortran sum could not be 
less precise than R. I was wrong. I am impressed. The R sum does in fact 
match the result if we use the Kahan algorithm.

P.

I am glad to see that R sum() is more accurate than the gfortran 
compiler sum.

On 16/03/18 11:37 AM, luke-tierney at uiowa.edu wrote:> Install the gmp package, run your code, and then try this:
>
> bu <- gmp::as.bigq(u)
> bs4 <- bu[1] + bu[2] + bu[3] + bu[4] + bu[5]
> s4 <- as.double(bs4)
> s1 - s4
> ##  [1] 0
> s2[[2]] - s4
> ##  [1] 7.105427e-15
> s3 - s4
> ##  [1] 7.105427e-15
> identical(s1, s4)
> ##  [1] TRUE
>
> `bs4` is the exact sum of the binary rationals in your `u` vector;
> `s4` is the closest double precision to this exact sum.
>
> Looking at the C source code for sum() will show you that it makes
> some extra efforts to get a more accurate sum than your simple
> version.
>
> Best,
>
> luke
>
> On Fri, 16 Mar 2018, Pierre Chausse wrote:
>
>> Hi all,
>>
>> I found a discrepancy between the sum() in R and either a sum done in 
>> C or Fortran for vector of just 5 elements. The difference is very 
>> small, but this is a very small part of a much larger numerical 
>> problem in which first and second derivatives are computed 
>> numerically. This is part of a numerical method course I am teaching 
>> in which I want to compare speeds of R versus Fortran (We solve a 
>> general equilibrium problem all numerically, if you want to know). 
>> Because of this discrepancy, the Jacobian and Hessian in R versus in 
>> Fortran are quite different, which results in the Newton method 
>> producing a different solution (for a given stopping rule). Since the 
>> solution computed in Fortran is almost identical to the analytical 
>> solution, I suspect that the sum in Fortran may be more accurate 
>> (That's just a guess).  Most of the time the sum produces identical
>> results, but for some numbers, it is different. The following 
>> example, shows what happens:
>>
>> set.seed(12233)
>> n <- 5
>> a <- runif(n,1,5)
>> e <- runif(n, 5*(1:n),10*(1:n))
>> s <- runif(1, 1.2, 4)
>> p <- runif(5, 3, 10)
>> x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5])
>> u <- a^(1/s)*x^((s-1)/s)
>> dyn.load("sumF.so")
>>
>> u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0
>> s1 <- sum(u)
>> s2 <- .Fortran("sumf", as.double(u), as.integer(n),
sf1=double(1),
>>               sf2=double(1))[3:4]
>> s3 <- .C("sumc", as.double(u), as.integer(n),
sC=double(1))[[3]]
>>
>> s1-s2[[1]] ## R versus compiler sum() (Fortran)
>>
>> [1] -7.105427e-15
>>
>> s1-s2[[2]] ## R versus manual sum (Fortran
>>
>> [1] -7.105427e-15
>>
>> s1-s3 ## R Versus manual sum in C
>>
>> [1] -7.105427e-15
>>
>> s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran)
>>
>> [1] 0
>>
>> s3-s2[[2]] ## Fortran versus C
>>
>> [1] 0
>>
>> My sumf and sumc are
>>
>>      subroutine sumf(x, n, sx1, sx2)
>>      integer i, n
>>      double precision x(n), sx1, sx2
>>      sx1 = sum(x)
>>      sx2 = 0.0d0
>>      do i=1,n
>>         sx2 = sx2+x(i)
>>      end do
>>      end
>>
>> void sumc(double *x, int *n, double *sum)
>> {
>>  int i;
>>  double sum1 = 0.0;
>>  for (i=0; i< *n; i++) {
>>    sum1 += x[i];
>>  }
>>  *sum = sum1;
>> }
>>
>> Can that be a bug?  Thanks.
>>
>>
>
-- 
Pierre Chauss?
Assistant Professor
Department of Economics
University of Waterloo

Reasonably Related Threads

Search for more reasonably related threads

R devel - Mar 2018 - Discrepancy: R sum() VS C or Fortran sum

[Rd] Discrepancy: R sum() VS C or Fortran sum

[Rd] Discrepancy: R sum() VS C or Fortran sum

[Rd] Discrepancy: R sum() VS C or Fortran sum

[Rd] Discrepancy: R sum() VS C or Fortran sum

Reasonably Related Threads