Full_Name: Morten Welinder Version: 2 OS: Solaris/space/gcc2.95.2 Submission from: (NULL) (65.213.85.217) I changed src/nmath/standalone/test.c to read: --------------------------------------------------------------------------------- #define MATHLIB_STANDALONE 1 #include <Rmath.h> #include <stdio.h> int main() { double x; for (x = 99990; x <= 100009; x++) printf ("%.14g --> %.14g\n", x, pgamma (1000000.0, x, 1.0, 0, 1)); return 0; } --------------------------------------------------------------------------------- and have no further local changes. I get... src/nmath/standalone> LD_LIBRARY_PATH=. ./test 99990 --> -669773.38974057 99991 --> -669771.08705436 99992 --> -669768.78437815 99993 --> -669766.48171194 99994 --> -669764.17905574 99995 --> -669761.87640953 99996 --> -669759.57377332 99997 --> -669757.27114712 99998 --> -669754.96853091 99999 --> -669752.66592471 100000 --> -669750.36332851 100001 --> -599731.36192347 <--- Look at me jump! 100002 --> -599729.89768519 100003 --> -599728.43344358 100004 --> -599726.96919865 100005 --> -599725.50495039 100006 --> -599724.0406988 100007 --> -599722.57644388 100008 --> -599721.11218564 100009 --> -599719.64792408 i.e., the result changes 70000 orders of magnitude right here. Ugh. This is affecting some erlang calculations of mine pretty badly. Judging from comments in pgamma someone else might have gotten hit by this around R1.8.1.
terra@gnome.org writes:> for (x = 99990; x <= 100009; x++) > printf ("%.14g --> %.14g\n", x, pgamma (1000000.0, x, 1.0, 0, 1));....> and have no further local changes. > > I get... > src/nmath/standalone> LD_LIBRARY_PATH=. ./test...> 99999 --> -669752.66592471 > 100000 --> -669750.36332851 > 100001 --> -599731.36192347 <--- Look at me jump! > 100002 --> -599729.89768519> i.e., the result changes 70000 orders of magnitude right here. Ugh. > This is affecting some erlang calculations of mine pretty badly.Make that 30400 orders of magnitude (natural logs y'know)... On something thats about 300000 orders of magnitude below 1, mind you! What the devil are you calculating? The probability that a random configuration of atoms would make up the known universe?> Judging from comments in pgamma someone else might have gotten hit by this > around R1.8.1.Beginning to look like a method bug. Recipe is to read the original paper, check for transcription errors from the Fortran algorithm, then check for errors going from theory to Fortran, then check the theory... PS: It happens on the R command line too:> pgamma(1000000,100000.001,1.0,,0,1) - pgamma(1000000,100000,1.0,,0,1)[1] 70017.54 -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
>>>>> "Morten" == Morten Welinder <terra@gnome.org> >>>>> on Mon, 25 Oct 2004 12:04:08 -0400 (EDT) writes:Morten> A little code study, formula study and experimentation reveals that the Morten> situation is mostly fixable: Morten> 1. Get rid of the explicit alpha limit. (A form of Morten> it is implicit in (2) and (3) below.) Morten> 2. Use the series formula when Morten> (x < alph + 10 && x < 0.99 * (alph + 1000)) Morten> This guarantees that the sum converges reasonably Morten> fast. (For extremely large x and alpha that will Morten> take about -53/log2(0.99) iterations for 53 Morten> significant bits, i.e., about 3700 iterations.) Morten> 3. Use the continued fraction formula when Morten> (alph < x && alph - 100 < 0.99 * x) Morten> Aka, you don't want to use the formula either near Morten> the critical point where alpha/x ~ 1 unless the Morten> numbers are small. Morten> 4a. Go to a library and figure out how Temme does it Morten> for alpha near x, both large. In this case the 0.99 Morten> from above could probably be lowered a lot for Morten> faster convergence. Morten> or Morten> 4b. Use the pnorm approximation. It seems to do a Morten> whole lot better for alpha near x than it did for Morten> the 10:1 case I quoted. Morten> Comments, please. Hi Morten, thanks a lot for your investigation. I have spent quite a few working days exploring pgamma() and also some alternatives. The discontinuity is "well know". I vaguely remember my conclusions were a bit different - at least over all problems (not only the one mentioned), it needed more fixing. I think I had included Temme's paper (and others) in my study. But really, I'm just talking from the top of my head; I will take the time to look into my old testing scripts and alternative C code; but just not this week (release of bioconductor upcoming; other urgencies). I'll be glad to also communicate with you offline on this topic {and about pbeta() !}. But "just not now". Martin Maechler, ETH Zurich
For the record, this is what I am going to use in Gnumeric for now. Morten /* * Problems fixed. * * 1. x = -inf, scale = +inf: result now 0, was nan. * * 2. No discontinuity for very small alph. * Old pgamma(1,1e-8,1,FALSE,TRUE) --> -19.9376 [ok] * Old pgamma(1,0.9999999999e-8,1,FALSE,TRUE) --> -5556027.755 [bad] * That's more than <dr_evil>two million</dr_evil> orders of magnitude caused * by the pnorm approximation being used in an area it was not intended for. * * 3. No discontinuity at alph=100000. * Old pgamma(1e6,100000,1,FALSE,TRUE) --> -669750.36332851 [ok] * Old pgamma(1e6,100001,1,FALSE,TRUE) --> -599731.36192347 [bad] * * 4. Improved precision in calculating log of series sum using log1p. * * 5. Avoid using series and continued fraction too close to the critical * line. * * 6. Improved precision of argument in pnorm approximation. * * 7. Use a power of 2 for rescaling in continued fraction so rescaling does * not introduce any rounding errors. */ /* * Abramowitz and Stegun 6.5.29 */ static double pgamma_series_sum (double x, double alph) { double sum = 0, c = 1, a = alph; do { a += 1.; c *= x / a; sum += c; } while (c > DBL_EPSILON * sum); return sum; } /* * Abramowitz and Stegun 6.5.31 */ static double pgamma_cont_frac (double x, double alph) { double a, b, n, an, pn1, pn2, pn3, pn4, pn5, pn6, sum, osum; double xlarge = pow (2, 128); a = 1. - alph; b = a + x + 1.; pn1 = 1.; pn2 = x; pn3 = x + 1.; pn4 = x * b; sum = pn3 / pn4; for (n = 1; ; n++) { a += 1.;/* = n+1 -alph */ b += 2.;/* = 2(n+1)-alph+x */ an = a * n; pn5 = b * pn3 - an * pn1; pn6 = b * pn4 - an * pn2; if (fabs(pn6) > 0.) { osum = sum; sum = pn5 / pn6; if (fabs(osum - sum) <= DBL_EPSILON * fmin2(1., sum)) break; } pn1 = pn3; pn2 = pn4; pn3 = pn5; pn4 = pn6; if (fabs(pn5) >= xlarge) { /* re-scale the terms in continued fraction if they are large */ #ifdef DEBUG_p REprintf(" [r] "); #endif pn1 /= xlarge; pn2 /= xlarge; pn3 /= xlarge; pn4 /= xlarge; } } return sum; } double pgamma(double x, double alph, double scale, int lower_tail, int log_p) { double arg; #ifdef IEEE_754 if (ISNAN(x) || ISNAN(alph) || ISNAN(scale)) return x + alph + scale; #endif if(alph <= 0. || scale <= 0.) ML_ERR_return_NAN; if (x <= 0.) return R_DT_0; x /= scale; #ifdef IEEE_754 if (ISNAN(x)) /* eg. original x = scale = +Inf */ return x; #endif if (x < 0.99 * (alph + 1000) && x < 10 + alph && x < 10 * alph) { /* * The first condition makes sure that terms in the sum get at * least 1% smaller, no later than after 1000 terms. * * The second condition makes sure that the terms start getting * smaller (even if just a tiny bit) after no more than 10 terms. * * The third condition makes sure that the terms don't get huge * before they start getting smaller. */ double C1mls2p = /* 1 - log (sqrt (2 * M_PI)) */ 0.081061466795327258219670263594382360138602526362216587182846; double logsum = log1p (pgamma_series_sum (x, alph)); /* * The first two terms here would cause cancellation for extremely * large and near-equal x and alph. The first condition above * prevents that. */ arg = (alph - x) + alph * log (x / (alph + 1)) + C1mls2p - log1p (alph) / 2 - stirlerr (alph + 1) + logsum; } else if (alph < x && alph - 100 < 0.99 * x) { /* * The first condition guarantees that we will start making * a tiny bit of progress right now. * * The second condition guarantees that we will make decent * progress after no more than 100 loops. */ double lfrac = log (pgamma_cont_frac (x, alph)); arg = lfrac + alph * log(x) - x - lgammafn(alph); lower_tail = !lower_tail; } else { /* * Approximation using pnorm. We only get here for fairly large * x and alph that are within 1% of each other. */ double r3m1 = expm1 (log (x / alph) / 3); return pnorm (sqrt (alph) * 3 * (r3m1 + 1 / (9 * alph)), 0, 1, lower_tail, log_p); } if (lower_tail) return log_p ? arg : exp(arg); else return log_p ? R_Log1_Exp(arg) : -expm1(arg); }