thr3ads.net - R devel - [Rd] pgamma - inadequate algorithm design and poor coding (PR#8528) [Jan 2006]

If this information is useful, please help other people find it:
Share via:

ripley@stats.ox.ac.uk

2006-Jan-27 13:31 UTC

[Rd] pgamma - inadequate algorithm design and poor coding (PR#8528)

R versions 2.1.0 to present.

Examples shown were computed under Windows R-devel, current SVN, but ix86 
Linux shows similar behaviour (sometimes NaN or -Inf rather than Inf, 
depending on the compiler and optimization level used).


The replacement pgamma algorithm used from R 2.1.0 has an inadequate 
design and no supporting documentation whatsoever.  There is no reference 
given to support the algorithm, and it seems very desirable to use only 
algorithms with a published (and preferably refereed) analysis, or at 
least of impeccable provenance.

The following errors were found by investigating an example in the 
d-p-q-r-tests.R regression tests that gave NaN on a real system.

These errors were not present in R 2.0.0, which used a normal 
approximation in that region.  We could fix this by reverting where needed 
to a normal approximation, but that leaves the problem that we have no 
proof of the validity or accuracy of the rest of the algorithm (if indeed 
it is accurate).

?pgamma says

      As from R 2.1.0 'pgamma()' uses a new algorithm (mainly by Morten
      Welinder) which should be uniformly as accurate as AS 239.

Well, it 'should be' but it is not, and we should not be making
statements
like that.  Those in the email quoted in pgamma.c exhibit hubris.

There are also at least two examples of sloppy coding that lead to numeric 
overflow and complete loss of accuracy.


Consider
> pgamma(seq(0.75, 1.25, by=0.05)*1e100, shape = 1e100, log=T)  [1] -3.768207e+98           NaN           NaN           NaN           NaN
  [6] -6.931472e-01           NaN           NaN           NaN           NaN
[11]  0.000000e+00
Warning message:
NaNs produced in: pgamma(q, shape, scale, lower.tail,
log.p)> pgamma(seq(0.75, 1.25, by=0.05)*1e100, shape = 1e100, log=T, lower=F)  [1]  0.000000e+00           NaN           NaN           NaN           NaN
  [6] -6.931472e-01           Inf           Inf           Inf           Inf
[11] -2.685645e+98
Warning message:
NaNs produced in: pgamma(q, shape, scale, lower.tail, log.p)
> pgamma(c(1-1e-10, 1+1e-10)*1e100, shape = 1e100)[1] NaN NaN

(shape=1e25 is enough to cause a breakdown in the first of these, and 
1e60 in the rest.)

The code has four branches

1) x <= 1

2) x <= alph - 1 && x < 0.8 * (alph + 50)).  This has the comment
/* incl. large alph */, but that is false.

3) if (alph - 1 < x && alph < 0.8 * (x + 50)).  This has the
comment
/* incl. large x */, but again false.

4) The rest, which uses an asymptotic expansion in

pt_ = -x * (log(1 + (lambda-x)/x) - (lambda-x)/x)
= -x * log((lambda-x)/x) - (lambda-x)

and naively assumes that this is small enough to use a power series 
expansion in 1/x with coefficients as powers of pt_.  To make matters 
worse, consider
> pgamma(0.9*1e25, 1e25, log=T)pgamma_raw(x=9e+024, alph=1e+025, low=1, log=1)
  using ppois_asymp()
pt_ = 5.3605156578263e+022
pp*_asymp(): f=-2.0803930924076e-013 np=-5.3605156578263e+022 
nd=-5.3605156578263e+022  f*nd=11151979746.284
[1] -Inf

Hmm, how did that manage to lose *all* the accuracy here?  Hubris again 
appears in the comments.

Here np and nd are on log scale and if they are large they will be almost 
equal (and negative), and f is not large (and if it were we could have 
computed log f).  So we can compute the log of np+f*nd accurately as

log(np*(1+f*nd/np)) = lnp + log(1+f*nd/np) = lnp + log1p(f*exp(lnd-lnp))


Almost all the mass of gamma(shape=1e100) is concentrated at the nearest 
representable value to 1e100:
> qgamma(c(1e-16, 1-1e-16), 1e100)-1e100[1] 0 0

(if it can be believed, but this can be verified independently).  So being 
accurate in the middle of the range is pretty academic, but one can at 
least avoid returning the nonsense of non-monotone cdfs and NaN/infinite 
probabilities.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Possibly Parallel Threads

Search for more seemingly similar threads

R devel - Jan 2006 - pgamma - inadequate algorithm design and poor coding (PR#8528)

[Rd] pgamma - inadequate algorithm design and poor coding (PR#8528)

Possibly Parallel Threads

Wisdom of the Ancients