thr3ads.net - R devel - [Rd] pgamma discontinuity (PR#7307) [Oct 2004]

If this information is useful, please help other people find it:
Share via:

terra@gnome.org

2004-Oct-22 20:47 UTC

[Rd] pgamma discontinuity (PR#7307)

Full_Name: Morten Welinder
Version: 2
OS: Solaris/space/gcc2.95.2
Submission from: (NULL) (65.213.85.217)


I changed src/nmath/standalone/test.c to read:

---------------------------------------------------------------------------------
#define MATHLIB_STANDALONE 1
#include <Rmath.h>
#include <stdio.h>
int
main()
{
  double x;
  for (x = 99990; x <= 100009; x++)
    printf ("%.14g --> %.14g\n", x, pgamma (1000000.0,  x, 1.0, 0,
1));
  return 0;
}
---------------------------------------------------------------------------------

and have no further local changes.

I get...
src/nmath/standalone> LD_LIBRARY_PATH=. ./test 
99990 --> -669773.38974057
99991 --> -669771.08705436
99992 --> -669768.78437815
99993 --> -669766.48171194
99994 --> -669764.17905574
99995 --> -669761.87640953
99996 --> -669759.57377332
99997 --> -669757.27114712
99998 --> -669754.96853091
99999 --> -669752.66592471
100000 --> -669750.36332851
100001 --> -599731.36192347       <--- Look at me jump!
100002 --> -599729.89768519
100003 --> -599728.43344358
100004 --> -599726.96919865
100005 --> -599725.50495039
100006 --> -599724.0406988
100007 --> -599722.57644388
100008 --> -599721.11218564
100009 --> -599719.64792408

i.e., the result changes 70000 orders of magnitude right here.  Ugh.
This is affecting some erlang calculations of mine pretty badly.

Judging from comments in pgamma someone else might have gotten hit by this
around R1.8.1.

Peter Dalgaard

2004-Oct-22 23:45 UTC

head link

[Rd] pgamma discontinuity (PR#7307)

terra@gnome.org writes:
>   for (x = 99990; x <= 100009; x++)
>     printf ("%.14g --> %.14g\n", x, pgamma (1000000.0,  x,
1.0, 0, 1));
....> and have no further local changes.
> 
> I get...
> src/nmath/standalone> LD_LIBRARY_PATH=. ./test 
...> 99999 --> -669752.66592471
> 100000 --> -669750.36332851
> 100001 --> -599731.36192347       <--- Look at me jump!
> 100002 --> -599729.89768519
> i.e., the result changes 70000 orders of magnitude right here.  Ugh.
> This is affecting some erlang calculations of mine pretty badly.
Make that 30400 orders of magnitude (natural logs y'know)... On
something thats about 300000 orders of magnitude below 1, mind you!
What the devil are you calculating? The probability that a random
configuration of atoms would make up the known universe?
 > Judging from comments in pgamma someone else might have gotten hit by this
> around R1.8.1.
Beginning to look like a method bug. Recipe is to read the original
paper, check for transcription errors from the Fortran algorithm, then
check for errors going from theory to Fortran, then check the theory...

PS: It happens on the R command line too:
>  pgamma(1000000,100000.001,1.0,,0,1) - pgamma(1000000,100000,1.0,,0,1)[1] 70017.54


-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907

maechler@stat.math.ethz.ch

2004-Oct-25 20:19 UTC

head link

[Rd] pgamma discontinuity (PR#7307)

>>>>> "Morten" == Morten Welinder
<terra@gnome.org>
>>>>>     on Mon, 25 Oct 2004 12:04:08 -0400 (EDT) writes:
    Morten> A little code study, formula study and experimentation reveals
that the
    Morten> situation is mostly fixable:

    Morten> 1. Get rid of the explicit alpha limit.  (A form of
    Morten> it is implicit in (2) and (3) below.)

    Morten> 2. Use the series formula when

    Morten> (x < alph + 10 && x < 0.99 * (alph + 1000))

    Morten> This guarantees that the sum converges reasonably
    Morten> fast.  (For extremely large x and alpha that will
    Morten> take about -53/log2(0.99) iterations for 53
    Morten> significant bits, i.e., about 3700 iterations.)

    Morten> 3. Use the continued fraction formula when

    Morten> (alph < x && alph - 100 < 0.99 * x)

    Morten> Aka, you don't want to use the formula either near
    Morten> the critical point where alpha/x ~ 1 unless the
    Morten> numbers are small.

    Morten> 4a. Go to a library and figure out how Temme does it
    Morten> for alpha near x, both large.  In this case the 0.99
    Morten> from above could probably be lowered a lot for
    Morten> faster convergence.

    Morten> or

    Morten> 4b. Use the pnorm approximation.  It seems to do a
    Morten> whole lot better for alpha near x than it did for
    Morten> the 10:1 case I quoted.

    Morten> Comments, please.

Hi Morten,
thanks a lot for your investigation.

I have spent quite a few working days exploring pgamma() and
also some alternatives.  The discontinuity is "well know".  I
vaguely remember my conclusions were a bit different - at least
over all problems (not only the one mentioned), it needed more
fixing. I think I had included Temme's paper (and others) in my study.

But really, I'm just talking from the top of my head; I will
take the time to look into my old testing scripts and
alternative C code; but just not this week (release of
bioconductor upcoming; other urgencies).

I'll be glad to also communicate with you offline on this topic
{and about pbeta() !}.  But "just not now".

Martin Maechler, ETH Zurich

terra@gnome.org

2004-Oct-27 08:15 UTC

head link

[Rd] pgamma discontinuity (PR#7307)

For the record, this is what I am going to use in Gnumeric for now.

Morten



/*
 * Problems fixed.
 *
 * 1. x = -inf, scale = +inf: result now 0, was nan.
 *
 * 2. No discontinuity for very small alph.
 *      Old pgamma(1,1e-8,1,FALSE,TRUE) --> -19.9376 [ok]
 *      Old pgamma(1,0.9999999999e-8,1,FALSE,TRUE) --> -5556027.755 [bad]
 *    That's more than <dr_evil>two million</dr_evil> orders of
magnitude caused
 *    by the pnorm approximation being used in an area it was not intended for.
 *
 * 3. No discontinuity at alph=100000.
 *      Old pgamma(1e6,100000,1,FALSE,TRUE) --> -669750.36332851 [ok]
 *      Old pgamma(1e6,100001,1,FALSE,TRUE) --> -599731.36192347 [bad]
 *
 * 4. Improved precision in calculating log of series sum using log1p.
 *
 * 5. Avoid using series and continued fraction too close to the critical
 *    line.
 *
 * 6. Improved precision of argument in pnorm approximation.
 *
 * 7. Use a power of 2 for rescaling in continued fraction so rescaling does
 *    not introduce any rounding errors.
 */





/*
 * Abramowitz and Stegun 6.5.29
 */
static double
pgamma_series_sum (double x, double alph)
{
     double sum = 0, c = 1, a = alph;

     do {
	  a += 1.;
	  c *= x / a;
	  sum += c;
     } while (c > DBL_EPSILON * sum);

     return sum;
}

/*
 * Abramowitz and Stegun 6.5.31
 */
static double
pgamma_cont_frac (double x, double alph)
{
     double a, b, n, an, pn1, pn2, pn3, pn4, pn5, pn6, sum, osum;
     double xlarge = pow (2, 128);

     a = 1. - alph;
     b = a + x + 1.;
     pn1 = 1.;
     pn2 = x;
     pn3 = x + 1.;
     pn4 = x * b;
     sum = pn3 / pn4;
     for (n = 1; ; n++) {
	  a += 1.;/* =   n+1 -alph */
	  b += 2.;/* = 2(n+1)-alph+x */
	  an = a * n;
	  pn5 = b * pn3 - an * pn1;
	  pn6 = b * pn4 - an * pn2;
	  if (fabs(pn6) > 0.) {
	       osum = sum;
	       sum = pn5 / pn6;
	       if (fabs(osum - sum) <= DBL_EPSILON * fmin2(1., sum))
		    break;
	  }
	  pn1 = pn3;
	  pn2 = pn4;
	  pn3 = pn5;
	  pn4 = pn6;
	  if (fabs(pn5) >= xlarge) {
	       /* re-scale the terms in continued fraction if they are large */
#ifdef DEBUG_p
	       REprintf(" [r] ");
#endif
	       pn1 /= xlarge;
	       pn2 /= xlarge;
	       pn3 /= xlarge;
	       pn4 /= xlarge;
	  }
     }

     return sum;
}


double pgamma(double x, double alph, double scale, int lower_tail, int log_p)
{
     double arg;

#ifdef IEEE_754
     if (ISNAN(x) || ISNAN(alph) || ISNAN(scale))
	  return x + alph + scale;
#endif
     if(alph <= 0. || scale <= 0.)
	  ML_ERR_return_NAN;
     if (x <= 0.)
	  return R_DT_0;
     x /= scale;
#ifdef IEEE_754
     if (ISNAN(x)) /* eg. original x = scale = +Inf */
	  return x;
#endif

     if (x < 0.99 * (alph + 1000) && x < 10 + alph && x
< 10 * alph) {
	  /*
	   * The first condition makes sure that terms in the sum get at
	   * least 1% smaller, no later than after 1000 terms.
	   *
	   * The second condition makes sure that the terms start getting
	   * smaller (even if just a tiny bit) after no more than 10 terms.
	   *
	   * The third condition makes sure that the terms don't get huge
	   * before they start getting smaller.
	   */
	  double C1mls2p = /* 1 - log (sqrt (2 * M_PI)) */
	       0.081061466795327258219670263594382360138602526362216587182846;
	  double logsum = log1p (pgamma_series_sum (x, alph));
	  /*
	   * The first two terms here would cause cancellation for extremely
	   * large and near-equal x and alph.  The first condition above
	   * prevents that.
	   */
	  arg = (alph - x) + alph * log (x / (alph + 1)) + C1mls2p -
	       log1p (alph) / 2 - stirlerr (alph + 1) + logsum; 
     } else if (alph < x && alph - 100 < 0.99 * x) {
	  /*
	   * The first condition guarantees that we will start making
	   * a tiny bit of progress right now.
	   *
	   * The second condition guarantees that we will make decent
	   * progress after no more than 100 loops.
	   */
	  double lfrac = log (pgamma_cont_frac (x, alph));
	  arg = lfrac + alph * log(x) - x - lgammafn(alph);
	  lower_tail = !lower_tail;
     } else {
	  /*
	   * Approximation using pnorm.  We only get here for fairly large
	   * x and alph that are within 1% of each other.
	   */
	  double r3m1 = expm1 (log (x / alph) / 3);
	  return pnorm (sqrt (alph) * 3 * (r3m1 + 1 / (9 * alph)),
			0, 1, lower_tail, log_p);
     }

     if (lower_tail)
	  return log_p ? arg : exp(arg);
     else
	  return log_p ? R_Log1_Exp(arg) : -expm1(arg);
}

Reasonably Related Threads

Search for more maybe matching threads

R devel - Oct 2004 - pgamma discontinuity (PR#7307)

[Rd] pgamma discontinuity (PR#7307)

[Rd] pgamma discontinuity (PR#7307)

[Rd] pgamma discontinuity (PR#7307)

[Rd] pgamma discontinuity (PR#7307)

Reasonably Related Threads