OK, I see what you mean. But in those cases, we don't get the catastrophic
failures from the
if (k < 0) return 0.;
if (k == 0) return 1.;
/* else: k >= 1 */
part, because at that point k is sure to be integer, possibly after rounding.
It is when n-k is approximately but not exactly zero and we should return 1,
that we either return 0 (negative case) or n (positive case; because the
n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or
n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an
integer, due to the
return R_IS_INT(n) ? R_forceint(r) : r;
part.
-pd
> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>
> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>>>
>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>> Yep, that looks wrong (probably want to continue discussion
over on R-devel)
>>>> I think the culprit is here (in src/nmath/choose.c)
>>>> if (k < k_small_max) {
>>>> int j;
>>>> if(n-k < k && n >= 0 &&
R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>> if (k < 0) return 0.;
>>>> if (k == 0) return 1.;
>>>> /* else: k >= 1 */
>>>> if n is a near-integer, then k can become non-integer and
negative. In your case,
>>>> n == 4 - 1e-7
>>>> k == 4
>>>> n - k == -1e-7 < 4
>>>> n >= 0
>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>> so k gets set to
>>>> n - k == -1e-7
>>>> which is less than 0, so we return 0. However, as you point
out, 1 would be more reasonable and in accordance with the limit as n -> 4,
e.g.
>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>> [1] -9.289025e-11
>>>> I guess that the fix could be as simple as replacing n by
R_forceint(n) in the k = n - k step.
>>>
>>> I think that would break symmetry: you want choose(n, k) to equal
choose(n, n-k) when n is very close to an integer. So I'd suggest the
replacement whenever R_IS_INT(n) is true.
>>>
>> But choose() very deliberately ensures that k is integer, so choose(n,
n-k) is ill-defined for non-integer n.
>
> That's only true if there's a big difference. I'd be worried
about cases where n and k are close to integers (within 1e-7). In those cases,
k is silently rounded to integer. As I read your suggestion, n would only be
rounded to integer if k > n-k. I think both n and k should be rounded to
integer in this near-integer situation, regardless of the value of k.
>
> I believe that lchoose(n, k) already does this.
>
> Duncan Murdoch
>
>> double r, k0 = k;
>> k = R_forceint(k);
>> ...
>> if (fabs(k - k0) > 1e-7)
>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer,
rounded to %.0f"), k0, k);
>>
>>> Duncan Murdoch
>>>
>>>> -pd
>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT
at pitt.edu> wrote:
>>>>>
>>>>> This struck me as incorrect:
>>>>>
>>>>>> choose(3.999999, 4)
>>>>> [1] 0.9999979
>>>>>> choose(3.9999999, 4)
>>>>> [1] 0
>>>>>> choose(4, 4)
>>>>> [1] 1
>>>>>> choose(4.0000001, 4)
>>>>> [1] 4
>>>>>> choose(4.000001, 4)
>>>>> [1] 1.000002
>>>>>
>>>>> Should base::choose(n, k) check whether n is within machine
precision of k and return 1?
>>>>>
>>>>> Thanks,
>>>>> Erik
>>>>>
>>>>> ***
>>>>> sessionInfo()
>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>> Running under: macOS High Sierra 10.13.6
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
At the risk of throwing oil on a fire. If we are talking about fractional
values of choose() doesn't it make sense to look to the gamma function for
the correct analytic continuation? In particular k<0 may not imply the
function should evaluate to zero until we get k<=-1.
Example:
``` r
choose(5, 4)
#> [1] 5
gchoose <- function(n, k) {
gamma(n+1)/(gamma(n+1-k) * gamma(k+1))
}
gchoose(5, 4)
#> [1] 5
gchoose(5, 0)
#> [1] 1
gchoose(5, -0.5)
#> [1] 0.2351727
```
> On Jan 14, 2020, at 10:20 AM, peter dalgaard <pdalgd at gmail.com>
wrote:
>
> OK, I see what you mean. But in those cases, we don't get the
catastrophic failures from the
>
> if (k < 0) return 0.;
> if (k == 0) return 1.;
> /* else: k >= 1 */
>
> part, because at that point k is sure to be integer, possibly after
rounding.
>
> It is when n-k is approximately but not exactly zero and we should return
1, that we either return 0 (negative case) or n (positive case; because the
n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or
n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an
integer, due to the
>
> return R_IS_INT(n) ? R_forceint(r) : r;
>
> part.
>
> -pd
>
>
>
>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>>
>> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>>>>
>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>>> Yep, that looks wrong (probably want to continue discussion
over on R-devel)
>>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>> if (k < k_small_max) {
>>>>> int j;
>>>>> if(n-k < k && n >= 0 &&
R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>> if (k < 0) return 0.;
>>>>> if (k == 0) return 1.;
>>>>> /* else: k >= 1 */
>>>>> if n is a near-integer, then k can become non-integer and
negative. In your case,
>>>>> n == 4 - 1e-7
>>>>> k == 4
>>>>> n - k == -1e-7 < 4
>>>>> n >= 0
>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>>> so k gets set to
>>>>> n - k == -1e-7
>>>>> which is less than 0, so we return 0. However, as you point
out, 1 would be more reasonable and in accordance with the limit as n -> 4,
e.g.
>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>>> [1] -9.289025e-11
>>>>> I guess that the fix could be as simple as replacing n by
R_forceint(n) in the k = n - k step.
>>>>
>>>> I think that would break symmetry: you want choose(n, k) to
equal choose(n, n-k) when n is very close to an integer. So I'd suggest the
replacement whenever R_IS_INT(n) is true.
>>>>
>>> But choose() very deliberately ensures that k is integer, so
choose(n, n-k) is ill-defined for non-integer n.
>>
>> That's only true if there's a big difference. I'd be
worried about cases where n and k are close to integers (within 1e-7). In those
cases, k is silently rounded to integer. As I read your suggestion, n would
only be rounded to integer if k > n-k. I think both n and k should be
rounded to integer in this near-integer situation, regardless of the value of k.
>>
>> I believe that lchoose(n, k) already does this.
>>
>> Duncan Murdoch
>>
>>> double r, k0 = k;
>>> k = R_forceint(k);
>>> ...
>>> if (fabs(k - k0) > 1e-7)
>>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer,
rounded to %.0f"), k0, k);
>>>
>>>> Duncan Murdoch
>>>>
>>>>> -pd
>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott
<ESWRIGHT at pitt.edu> wrote:
>>>>>>
>>>>>> This struck me as incorrect:
>>>>>>
>>>>>>> choose(3.999999, 4)
>>>>>> [1] 0.9999979
>>>>>>> choose(3.9999999, 4)
>>>>>> [1] 0
>>>>>>> choose(4, 4)
>>>>>> [1] 1
>>>>>>> choose(4.0000001, 4)
>>>>>> [1] 4
>>>>>>> choose(4.000001, 4)
>>>>>> [1] 1.000002
>>>>>>
>>>>>> Should base::choose(n, k) check whether n is within
machine precision of k and return 1?
>>>>>>
>>>>>> Thanks,
>>>>>> Erik
>>>>>>
>>>>>> ***
>>>>>> sessionInfo()
>>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>>> Running under: macOS High Sierra 10.13.6
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
---------------
John Mount
http://www.win-vector.com/ <http://www.win-vector.com/>
Our book: Practical Data Science with R
http://practicaldatascience.com <http://practicaldatascience.com/>
[[alternative HTML version deleted]]
That crossed my mind too, but presumably someone designed choose() to handle the near-integer cases specially. Otherwise, we already have beta() -- you just need to remember what the connection is ;-). I would expect that it has to do with the binomial and negative binomial distributions, but I can't offhand picture a calculation that leads to integer k, n plus/minus a tiny numerical error of the sort that one may encounter with, say, seq(). -pd ;-) choose(a,b) = 1/(beta(a-b+1,b+1)*(a+1)) or thereabouts> On 14 Jan 2020, at 19:36 , John Mount <jmount at win-vector.com> wrote: > > > At the risk of throwing oil on a fire. If we are talking about fractional values of choose() doesn't it make sense to look to the gamma function for the correct analytic continuation? In particular k<0 may not imply the function should evaluate to zero until we get k<=-1. > > Example: > > ``` r > choose(5, 4) > #> [1] 5 > > gchoose <- function(n, k) { > gamma(n+1)/(gamma(n+1-k) * gamma(k+1)) > } > > gchoose(5, 4) > #> [1] 5 > gchoose(5, 0) > #> [1] 1 > gchoose(5, -0.5) > #> [1] 0.2351727 > ``` > >> On Jan 14, 2020, at 10:20 AM, peter dalgaard <pdalgd at gmail.com> wrote: >> >> OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the >> >> if (k < 0) return 0.; >> if (k == 0) return 1.; >> /* else: k >= 1 */ >> >> part, because at that point k is sure to be integer, possibly after rounding. >> >> It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the >> >> return R_IS_INT(n) ? R_forceint(r) : r; >> >> part. >> >> -pd >> >> >> >>> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >>> >>> On 14/01/2020 10:50 a.m., peter dalgaard wrote: >>>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >>>>> >>>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote: >>>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel) >>>>>> I think the culprit is here (in src/nmath/choose.c) >>>>>> if (k < k_small_max) { >>>>>> int j; >>>>>> if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >>>>>> if (k < 0) return 0.; >>>>>> if (k == 0) return 1.; >>>>>> /* else: k >= 1 */ >>>>>> if n is a near-integer, then k can become non-integer and negative. In your case, >>>>>> n == 4 - 1e-7 >>>>>> k == 4 >>>>>> n - k == -1e-7 < 4 >>>>>> n >= 0 >>>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) >>>>>> so k gets set to >>>>>> n - k == -1e-7 >>>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. >>>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 >>>>>> [1] -9.289025e-11 >>>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. >>>>> >>>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true. >>>>> >>>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n. >>> >>> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e-7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > n-k. I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k. >>> >>> I believe that lchoose(n, k) already does this. >>> >>> Duncan Murdoch >>> >>>> double r, k0 = k; >>>> k = R_forceint(k); >>>> ... >>>> if (fabs(k - k0) > 1e-7) >>>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k); >>>> >>>>> Duncan Murdoch >>>>> >>>>>> -pd >>>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at pitt.edu> wrote: >>>>>>> >>>>>>> This struck me as incorrect: >>>>>>> >>>>>>>> choose(3.999999, 4) >>>>>>> [1] 0.9999979 >>>>>>>> choose(3.9999999, 4) >>>>>>> [1] 0 >>>>>>>> choose(4, 4) >>>>>>> [1] 1 >>>>>>>> choose(4.0000001, 4) >>>>>>> [1] 4 >>>>>>>> choose(4.000001, 4) >>>>>>> [1] 1.000002 >>>>>>> >>>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >>>>>>> >>>>>>> Thanks, >>>>>>> Erik >>>>>>> >>>>>>> *** >>>>>>> sessionInfo() >>>>>>> R version 3.6.0 beta (2019-04-15 r76395) >>>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit) >>>>>>> Running under: macOS High Sierra 10.13.6 >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Peter Dalgaard, Professor, >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Office: A 4.23 >> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > --------------- > John Mount > http://www.win-vector.com/ > Our book: Practical Data Science with R > http://practicaldatascience.com > > > > >-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Um, n(n-1)(n-2)...(n-k+1)/factorial(k), of course, but I think the argument still holds. Also, note that there are overflow conditions involved with computing n(n-1)(n-2)...(n-k+1)/factorial(k) as written, so it is computed in a loop with alternating multiply and divide steps. This introduces FP errors even if it is known that the result should be integer. I.e., we cannot remove the final "R_IS_INT(n) ? R_forceint(r) : r" if we want choose(n, k) to return an integer if n and k are integers. -pd> On 14 Jan 2020, at 19:20 , peter dalgaard <pdalgd at gmail.com> wrote: > > OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the > > if (k < 0) return 0.; > if (k == 0) return 1.; > /* else: k >= 1 */ > > part, because at that point k is sure to be integer, possibly after rounding. > > It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the > > return R_IS_INT(n) ? R_forceint(r) : r; > > part. > > -pd > > > >> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >> >> On 14/01/2020 10:50 a.m., peter dalgaard wrote: >>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >>>> >>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote: >>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel) >>>>> I think the culprit is here (in src/nmath/choose.c) >>>>> if (k < k_small_max) { >>>>> int j; >>>>> if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >>>>> if (k < 0) return 0.; >>>>> if (k == 0) return 1.; >>>>> /* else: k >= 1 */ >>>>> if n is a near-integer, then k can become non-integer and negative. In your case, >>>>> n == 4 - 1e-7 >>>>> k == 4 >>>>> n - k == -1e-7 < 4 >>>>> n >= 0 >>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) >>>>> so k gets set to >>>>> n - k == -1e-7 >>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. >>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 >>>>> [1] -9.289025e-11 >>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. >>>> >>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true. >>>> >>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n. >> >> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e-7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > n-k. I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k. >> >> I believe that lchoose(n, k) already does this. >> >> Duncan Murdoch >> >>> double r, k0 = k; >>> k = R_forceint(k); >>> ... >>> if (fabs(k - k0) > 1e-7) >>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k); >>> >>>> Duncan Murdoch >>>> >>>>> -pd >>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at pitt.edu> wrote: >>>>>> >>>>>> This struck me as incorrect: >>>>>> >>>>>>> choose(3.999999, 4) >>>>>> [1] 0.9999979 >>>>>>> choose(3.9999999, 4) >>>>>> [1] 0 >>>>>>> choose(4, 4) >>>>>> [1] 1 >>>>>>> choose(4.0000001, 4) >>>>>> [1] 4 >>>>>>> choose(4.000001, 4) >>>>>> [1] 1.000002 >>>>>> >>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >>>>>> >>>>>> Thanks, >>>>>> Erik >>>>>> >>>>>> *** >>>>>> sessionInfo() >>>>>> R version 3.6.0 beta (2019-04-15 r76395) >>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit) >>>>>> Running under: macOS High Sierra 10.13.6 >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > >-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Fix committed to R-devel w/regression test. I settled for just doing k = R_forceint(n - k) inside if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ so that k stays integer. In principle, you also could prefix this code r = n; for(j = 2; j <= k; j++) r *= (n-j+1)/j; return R_IS_INT(n) ? R_forceint(r) : r; /* might have got rounding errors */ with if(R_IS_INT(n)) n = R_forceint(n); but as I said, I believe that is really a no-op because of the rounding at the end. -pd> On 14 Jan 2020, at 19:20 , peter dalgaard <pdalgd at gmail.com> wrote: > > OK, I see what you mean. But in those cases, we don't get the catastrophic failures from the > > if (k < 0) return 0.; > if (k == 0) return 1.; > /* else: k >= 1 */ > > part, because at that point k is sure to be integer, possibly after rounding. > > It is when n-k is approximately but not exactly zero and we should return 1, that we either return 0 (negative case) or n (positive case; because the n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an integer, due to the > > return R_IS_INT(n) ? R_forceint(r) : r; > > part. > > -pd > > > >> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >> >> On 14/01/2020 10:50 a.m., peter dalgaard wrote: >>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >>>> >>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote: >>>>> Yep, that looks wrong (probably want to continue discussion over on R-devel) >>>>> I think the culprit is here (in src/nmath/choose.c) >>>>> if (k < k_small_max) { >>>>> int j; >>>>> if(n-k < k && n >= 0 && R_IS_INT(n)) k = n-k; /* <- Symmetry */ >>>>> if (k < 0) return 0.; >>>>> if (k == 0) return 1.; >>>>> /* else: k >= 1 */ >>>>> if n is a near-integer, then k can become non-integer and negative. In your case, >>>>> n == 4 - 1e-7 >>>>> k == 4 >>>>> n - k == -1e-7 < 4 >>>>> n >= 0 >>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed) >>>>> so k gets set to >>>>> n - k == -1e-7 >>>>> which is less than 0, so we return 0. However, as you point out, 1 would be more reasonable and in accordance with the limit as n -> 4, e.g. >>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1 >>>>> [1] -9.289025e-11 >>>>> I guess that the fix could be as simple as replacing n by R_forceint(n) in the k = n - k step. >>>> >>>> I think that would break symmetry: you want choose(n, k) to equal choose(n, n-k) when n is very close to an integer. So I'd suggest the replacement whenever R_IS_INT(n) is true. >>>> >>> But choose() very deliberately ensures that k is integer, so choose(n, n-k) is ill-defined for non-integer n. >> >> That's only true if there's a big difference. I'd be worried about cases where n and k are close to integers (within 1e-7). In those cases, k is silently rounded to integer. As I read your suggestion, n would only be rounded to integer if k > n-k. I think both n and k should be rounded to integer in this near-integer situation, regardless of the value of k. >> >> I believe that lchoose(n, k) already does this. >> >> Duncan Murdoch >> >>> double r, k0 = k; >>> k = R_forceint(k); >>> ... >>> if (fabs(k - k0) > 1e-7) >>> MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to %.0f"), k0, k); >>> >>>> Duncan Murdoch >>>> >>>>> -pd >>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at pitt.edu> wrote: >>>>>> >>>>>> This struck me as incorrect: >>>>>> >>>>>>> choose(3.999999, 4) >>>>>> [1] 0.9999979 >>>>>>> choose(3.9999999, 4) >>>>>> [1] 0 >>>>>>> choose(4, 4) >>>>>> [1] 1 >>>>>>> choose(4.0000001, 4) >>>>>> [1] 4 >>>>>>> choose(4.000001, 4) >>>>>> [1] 1.000002 >>>>>> >>>>>> Should base::choose(n, k) check whether n is within machine precision of k and return 1? >>>>>> >>>>>> Thanks, >>>>>> Erik >>>>>> >>>>>> *** >>>>>> sessionInfo() >>>>>> R version 3.6.0 beta (2019-04-15 r76395) >>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit) >>>>>> Running under: macOS High Sierra 10.13.6 >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > >-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com