lgamma(x) and lfactorial(x) are defined to return ln|Gamma(x)| {= log(abs(gamma(x)))} or ln|Gamma(x+1)| respectively. Unfortunately, we haven't chosen the analogous definition for lchoose(). So, currently > lchoose(1/2, 1:10) [1] -0.6931472 -2.0794415 NaN -3.2425924 NaN -3.8869494 [7] NaN -4.3357508 NaN -4.6805913 Warning message: In lchoose(n, k) : NaNs produced > which (the NaN's) is not particularly useful. (I have use case where I really have to workaround those NaNs.) I herebey propose to *amend* the definition of lchoose() such that it behaves analogously to lgamma() and lfactorial(), i.e., to return log(abs(choose(.,.)) Your comments are welcome. Martin Maechler, ETH Zurich
Hi Martin I think you're absolutely right about this; One thing I need again and again is a multinomial function, and usually define: > lmultinomial function (x) { lfactorial(sum(x)) - sum(lfactorial(x)) } > multinomial function (x) { exp(lmultinomial(x)) } It would be nice to have this in base R. Is this the place to discuss having complex arguments for gamma()? best wishes rksh Martin Maechler wrote:> lgamma(x) and lfactorial(x) are defined to return > > ln|Gamma(x)| {= log(abs(gamma(x)))} or ln|Gamma(x+1)| respectively. > > Unfortunately, we haven't chosen the analogous definition for > lchoose(). > > So, currently > > > lchoose(1/2, 1:10) > [1] -0.6931472 -2.0794415 NaN -3.2425924 NaN -3.8869494 > [7] NaN -4.3357508 NaN -4.6805913 > Warning message: > In lchoose(n, k) : NaNs produced > > > > which (the NaN's) is not particularly useful. > (I have use case where I really have to workaround those NaNs.) > > I herebey propose to *amend* the definition of lchoose() such > that it behaves analogously to lgamma() and lfactorial(), > i.e., to return > > log(abs(choose(.,.)) > > Your comments are welcome. > Martin Maechler, ETH Zurich > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877
On Tue, Dec 15, 2009 at 09:49:28AM +0100, Martin Maechler wrote:> lgamma(x) and lfactorial(x) are defined to return > > ln|Gamma(x)| {= log(abs(gamma(x)))} or ln|Gamma(x+1)| respectively. > > Unfortunately, we haven't chosen the analogous definition for > lchoose(). > > So, currently > > > lchoose(1/2, 1:10) > [1] -0.6931472 -2.0794415 NaN -3.2425924 NaN -3.8869494 > [7] NaN -4.3357508 NaN -4.6805913 > Warning message: > In lchoose(n, k) : NaNs produced > > > > which (the NaN's) is not particularly useful. > (I have use case where I really have to workaround those NaNs.) > > I herebey propose to *amend* the definition of lchoose() such > that it behaves analogously to lgamma() and lfactorial(), > i.e., to return > > log(abs(choose(.,.)) > > Your comments are welcome. > Martin Maechler, ETH ZurichI do not have a strong opinion, whether lchoose() should be log(choose(.,.)) or log(abs(choose(.,.))), although i would slightly prefere the latter (with abs()). One of the reasons is that this would simplify the code of the function lchoose() in src/nmath/choose.c. It produces the NA by explicit testing the sign, so this test could be simply removed. Let me also point out that the current implementation seems to contain a bug, which causes that it is neither log(choose(.,.)) nor log(abs(choose(.,.))). x <- choose(1/2, 1:10) y <- lchoose(1/2, 1:10) cbind(choose=x, lchoose=y, log.choose=log(abs(x))) choose lchoose log.choose # [1,] 0.500000000 -0.6931472 -0.6931472 # [2,] -0.125000000 -2.0794415 -2.0794415 # [3,] 0.062500000 NaN -2.7725887 # [4,] -0.039062500 -3.2425924 -3.2425924 # [5,] 0.027343750 NaN -3.5992673 # [6,] -0.020507812 -3.8869494 -3.8869494 # [7,] 0.016113281 NaN -4.1281114 # [8,] -0.013092041 -4.3357508 -4.3357508 # [9,] 0.010910034 NaN -4.5180723 # [10,] -0.009273529 -4.6805913 -4.6805913 So, besides x[1], we get NA for those cases, when choose() is positive. The reason seems to be the test at line 89 of src/nmath/choose.c, which is if (fmod(floor(n-k+1), 2.) == 0) /* choose() < 0 */ return ML_NAN; but it should be negated. I tested it using if (fmod(floor(n-k+1), 2.) != 0) /* choose() < 0 */ return ML_NAN; when we get choose lchoose log.choose [1,] 0.500000000 -0.6931472 -0.6931472 [2,] -0.125000000 NaN -2.0794415 [3,] 0.062500000 -2.7725887 -2.7725887 [4,] -0.039062500 NaN -3.2425924 [5,] 0.027343750 -3.5992673 -3.5992673 [6,] -0.020507812 NaN -3.8869494 [7,] 0.016113281 -4.1281114 -4.1281114 [8,] -0.013092041 NaN -4.3357508 [9,] 0.010910034 -4.5180723 -4.5180723 [10,] -0.009273529 NaN -4.6805913 Petr Savicky.
On Tue, Feb 02, 2010 at 01:37:46PM +0100, Petr Savicky wrote:> I would like to add some more information concerning the patch C > to the function choose() proposed in the email > https://stat.ethz.ch/pipermail/r-devel/2009-December/056177.html > > The patch uses transformations of choose(n, k), which are described in > http://www.cs.cas.cz/~savicky/R-devel/formulas_choose.pdf > > The accuracy of the modified function choose(n, k) may be verified > on randomly generated examples using Rmpfr package > http://cran.at.r-project.org/web/packages/Rmpfr/index.html > and the script > http://www.cs.cas.cz/~savicky/R-devel/test_choose_2.RLet me add an explanation of the script test_choose_2.R. The script generates a vector of random real numbers n, which are from a continuous distribution, but concentrate near integer values. The original implementation of choose(m, k) considers m as an integer, if abs(m - round(m)) <= 1e-7. The vector n is generated so that the probability of abs(n[i] - round(n[i])) <= 1e-7 is approximately 0.1. The distribution of n[i] - round(n[i]) is symmetric around 0, so we get both n[i], which are close to an integer from below and from above. On the other hand, the probability of abs(n[i] - round(n[i])) >= 0.3 is approximately 0.1083404, so there are also numbers n[i], which are not close to an integer. The script calculates choose(n, k) for k in 0:209 (an ad hoc upper bound) 1. using the modified choose(n, k) from patch C 2. using the expression n(n-1)...(n-k+1)/(1 2 ... k) with Rmpfr numbers of precision 100 bits. The relative difference of these two results is computed and its maximum over all n[i] and k from 0 to a given bound is reported. The bounds on k are chosen to be the numbers, whose last digit is 9, since the algorithm in choose(n,k) is different for k <= 29 and k >= 30. An upper bound of the relative rounding error of a single operation with Rmpfr numbers of precision 100 bits is (1 + 2^-100). Hence, an upper bound on the total relative error of n(n-1)...(n-k+1)/(1 2 ... k) is (1 + 2^-100)^(2*209) \approx (1 + 2 * 209 * 2^-100) \approx 1 + 3.297e-28. This is a negligible error compared to the errors reported by test_choose_2.R, so the reported errors are the errors of the patched choose(n, k). The errors reported by test_choose_2.R with patched choose(n,k) are in a previous email. Running test_choose_2.R with unpatched R version 2.11.0 (2010-02-01 r51089) produces larger errors. > source("test_choose_2.R") k <= 9 max rel err = Inf k <= 19 max rel err = Inf > source("test_choose_2.R") k <= 9 max rel err = 0.1111111 k <= 19 max rel err = Inf > source("test_choose_2.R") k <= 9 max rel err = Inf k <= 19 max rel err = Inf > source("test_choose_2.R") k <= 9 max rel err = 8.383718e-08 k <= 19 max rel err = 1.226306e-07 k <= 29 max rel err = 1.469050e-07 Error: segfault from C stack overflow The Inf relative errors occur in cases, where choose(n, k) calculates 0, but the correct result is not 0. The stack overflow error is sometimes generated due to an infinite sequence of transformations choose(n, k) -> choose(n, n-k) -> choose(n, round(n-k)) which occur if k = 30 and n = 60 - eps. The reason for the transformation choose(n, k) -> choose(n, n-k) is that k >= k_small_max = 30 n is treated as an integer in R_IS_INT(n) n-k < k_small_max So, choose(n, n-k) is called. There, we determine that n-k is almost an integer and since n-k is the second argument of choose(n,k), it is explicitly rounded to an integer. Since n = 2*k - eps, we have round(n-k) = round(k - eps) = k. The result is that we again call choose(n, k) and this repeats until the stack overflow. For example > choose(60 - 1e-9, 30) Error: segfault from C stack overflow Besides patch C, which corrects this stack overflow, also the previous patches A, B from https://stat.ethz.ch/pipermail/r-devel/2009-December/056154.html correct this, but have lower accuracy. Petr Savicky.