This struck me as incorrect:> choose(3.999999, 4)[1] 0.9999979> choose(3.9999999, 4)[1] 0> choose(4, 4)[1] 1> choose(4.0000001, 4)[1] 4> choose(4.000001, 4)[1] 1.000002 Should base::choose(n, k) check whether n is within machine precision of k and return 1? Thanks, Erik *** sessionInfo() R version 3.6.0 beta (2019-04-15 r76395) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.6 [[alternative HTML version deleted]]
On 13/01/2020 6:33 p.m., Wright, Erik Scott wrote:> This struck me as incorrect: > >> choose(3.999999, 4) > [1] 0.9999979 >> choose(3.9999999, 4) > [1] 0 >> choose(4, 4) > [1] 1 >> choose(4.0000001, 4) > [1] 4 >> choose(4.000001, 4) > [1] 1.000002 > > Should base::choose(n, k) check whether n is within machine precision of k and return 1?I don't think that's the solution. The current code checks whether n is within 1e-7 of an integer; if it is and n-k is smaller than k, it computes choose(n, n-k) instead. The problem in your second example is that n-k < 0 which implies the answer should be zero. In the 4th example n-k > 0 but it is not an integer; the code rounds k to an integer, but the transformation to n-k happens after that, so the code ends up working with a non-integer. I think a solution would be to force n to be an integer if it is very close to one. I note that the source to lchoose() seems to already do this: it handles your examples nicely. Duncan Murdoch
Yep, that looks wrong (probably want to continue discussion over on R-devel) I think the culprit is here (in src/nmath/choose.c) if (k < k_small_max) { int j; if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ if (k < 0) return 0.; if (k == 0) return 1.; /* else: k >= 1 */ if n is a near-integer, then k can become non-integer and negative. In your case, n == 4 - 1e-7 k == 4 n - k == -1e-7 < 4 n >= 0 R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) so k gets set to n - k == -1e-7 which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g.> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1[1] -9.289025e-11 I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. -pd> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at pitt.edu> wrote: > > This struck me as incorrect: > >> choose(3.999999, 4) > [1] 0.9999979 >> choose(3.9999999, 4) > [1] 0 >> choose(4, 4) > [1] 1 >> choose(4.0000001, 4) > [1] 4 >> choose(4.000001, 4) > [1] 1.000002 > > Should base::choose(n, k) check whether n is within machine precision of k and return 1? > > Thanks, > Erik > > *** > sessionInfo() > R version 3.6.0 beta (2019-04-15 r76395) > Platform: x86_64-apple-darwin15.6.0 (64-bit) > Running under: macOS High Sierra 10.13.6 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On 14/01/2020 10:07 a.m., peter dalgaard wrote:> Yep, that looks wrong (probably want to continue discussion over on R-devel) > > I think the culprit is here (in src/nmath/choose.c) > > if (k < k_small_max) { > int j; > if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ > if (k < 0) return 0.; > if (k == 0) return 1.; > /* else: k >= 1 */ > > if n is a near-integer, then k can become non-integer and negative. In your case, > > n == 4 - 1e-7 > k == 4 > n - k == -1e-7 < 4 > n >= 0 > R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) > > so k gets set to > > n - k == -1e-7 > > which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. > >> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 > [1] -9.289025e-11 > > I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step.I think that would break symmetry: you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true. Duncan Murdoch> > -pd > > > >> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at pitt.edu> wrote: >> >> This struck me as incorrect: >> >>> choose(3.999999, 4) >> [1] 0.9999979 >>> choose(3.9999999, 4) >> [1] 0 >>> choose(4, 4) >> [1] 1 >>> choose(4.0000001, 4) >> [1] 4 >>> choose(4.000001, 4) >> [1] 1.000002 >> >> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >> >> Thanks, >> Erik >> >> *** >> sessionInfo() >> R version 3.6.0 beta (2019-04-15 r76395) >> Platform: x86_64-apple-darwin15.6.0 (64-bit) >> Running under: macOS High Sierra 10.13.6 >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >