> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: > > On 14/01/2020 10:07 a.m., peter dalgaard wrote: >> Yep, that looks wrong (probably want to continue discussion over on R-devel) >> I think the culprit is here (in src/nmath/choose.c) >> if (k < k_small_max) { >> int j; >> if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >> if (k < 0) return 0.; >> if (k == 0) return 1.; >> /* else: k >= 1 */ >> if n is a near-integer, then k can become non-integer and negative. In your case, >> n == 4 - 1e-7 >> k == 4 >> n - k == -1e-7 < 4 >> n >= 0 >> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) >> so k gets set to >> n - k == -1e-7 >> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. >>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 >> [1] -9.289025e-11 >> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. > > I think that would break symmetry: you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true. >But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n. double r, k0 = k; k = R_forceint(k); ... if (fabs(k - k0) > 1e-7) MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k);> Duncan Murdoch > >> -pd >>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at pitt.edu> wrote: >>> >>> This struck me as incorrect: >>> >>>> choose(3.999999, 4) >>> [1] 0.9999979 >>>> choose(3.9999999, 4) >>> [1] 0 >>>> choose(4, 4) >>> [1] 1 >>>> choose(4.0000001, 4) >>> [1] 4 >>>> choose(4.000001, 4) >>> [1] 1.000002 >>> >>> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >>> >>> Thanks, >>> Erik >>> >>> *** >>> sessionInfo() >>> R version 3.6.0 beta (2019-04-15 r76395) >>> Platform: x86_64-apple-darwin15.6.0 (64-bit) >>> Running under: macOS High Sierra 10.13.6 >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On 14/01/2020 10:50 a.m., peter dalgaard wrote:> > >> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >> >> On 14/01/2020 10:07 a.m., peter dalgaard wrote: >>> Yep, that looks wrong (probably want to continue discussion over on R-devel) >>> I think the culprit is here (in src/nmath/choose.c) >>> if (k < k_small_max) { >>> int j; >>> if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >>> if (k < 0) return 0.; >>> if (k == 0) return 1.; >>> /* else: k >= 1 */ >>> if n is a near-integer, then k can become non-integer and negative. In your case, >>> n == 4 - 1e-7 >>> k == 4 >>> n - k == -1e-7 < 4 >>> n >= 0 >>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) >>> so k gets set to >>> n - k == -1e-7 >>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. >>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 >>> [1] -9.289025e-11 >>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. >> >> I think that would break symmetry: you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true. >> > > But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n.That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e-7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > n-k. I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k. I believe that lchoose(n, k) already does this. Duncan Murdoch> > double r, k0 = k; > k = R_forceint(k); > ... > if (fabs(k - k0) > 1e-7) > MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k); > > >> Duncan Murdoch >> >>> -pd >>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at pitt.edu> wrote: >>>> >>>> This struck me as incorrect: >>>> >>>>> choose(3.999999, 4) >>>> [1] 0.9999979 >>>>> choose(3.9999999, 4) >>>> [1] 0 >>>>> choose(4, 4) >>>> [1] 1 >>>>> choose(4.0000001, 4) >>>> [1] 4 >>>>> choose(4.000001, 4) >>>> [1] 1.000002 >>>> >>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >>>> >>>> Thanks, >>>> Erik >>>> >>>> *** >>>> sessionInfo() >>>> R version 3.6.0 beta (2019-04-15 r76395) >>>> Platform: x86_64-apple-darwin15.6.0 (64-bit) >>>> Running under: macOS High Sierra 10.13.6 >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >
OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the if (k < 0) return 0.; if (k == 0) return 1.; /* else: k >= 1 */ part, because at that point k is sure to be integer, possibly after rounding. It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the return R_IS_INT(n) ? R_forceint(r) : r; part. -pd> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: > > On 14/01/2020 10:50 a.m., peter dalgaard wrote: >>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >>> >>> On 14/01/2020 10:07 a.m., peter dalgaard wrote: >>>> Yep, that looks wrong (probably want to continue discussion over on R-devel) >>>> I think the culprit is here (in src/nmath/choose.c) >>>> if (k < k_small_max) { >>>> int j; >>>> if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >>>> if (k < 0) return 0.; >>>> if (k == 0) return 1.; >>>> /* else: k >= 1 */ >>>> if n is a near-integer, then k can become non-integer and negative. In your case, >>>> n == 4 - 1e-7 >>>> k == 4 >>>> n - k == -1e-7 < 4 >>>> n >= 0 >>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) >>>> so k gets set to >>>> n - k == -1e-7 >>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. >>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 >>>> [1] -9.289025e-11 >>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. >>> >>> I think that would break symmetry: you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true. >>> >> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n. > > That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e-7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > n-k. I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k. > > I believe that lchoose(n, k) already does this. > > Duncan Murdoch > >> double r, k0 = k; >> k = R_forceint(k); >> ... >> if (fabs(k - k0) > 1e-7) >> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k); >> >>> Duncan Murdoch >>> >>>> -pd >>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at pitt.edu> wrote: >>>>> >>>>> This struck me as incorrect: >>>>> >>>>>> choose(3.999999, 4) >>>>> [1] 0.9999979 >>>>>> choose(3.9999999, 4) >>>>> [1] 0 >>>>>> choose(4, 4) >>>>> [1] 1 >>>>>> choose(4.0000001, 4) >>>>> [1] 4 >>>>>> choose(4.000001, 4) >>>>> [1] 1.000002 >>>>> >>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >>>>> >>>>> Thanks, >>>>> Erik >>>>> >>>>> *** >>>>> sessionInfo() >>>>> R version 3.6.0 beta (2019-04-15 r76395) >>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit) >>>>> Running under: macOS High Sierra 10.13.6 >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com