thr3ads.net - R devel - [Rd] [R] choose(n, k) as n approaches k [Jan 2020]

If this information is useful, please help other people find it:
Share via:

peter dalgaard

2020-Jan-14 15:50 UTC

[Rd] [R] choose(n, k) as n approaches k

> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
> 
> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>> Yep, that looks wrong (probably want to continue discussion over on
R-devel)
>> I think the culprit is here (in src/nmath/choose.c)
>>      if (k < k_small_max) {
>>         int j;
>>         if(n-k < k && n >= 0 && R_IS_INT(n)) k =
n-k; /* <- Symmetry */
>>         if (k <  0) return 0.;
>>         if (k == 0) return 1.;
>>         /* else: k >= 1 */
>> if n is a near-integer, then k can become non-integer and negative. In
your case,
>> n == 4 - 1e-7
>> k == 4
>> n - k == -1e-7 < 4
>> n >= 0
>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>> so k gets set to
>> n - k == -1e-7
>> which is less than 0, so we return 0. However, as you point out, 1
would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>> [1] -9.289025e-11
>> I guess that the fix could be as simple as replacing n by R_forceint(n)
in the k = n - k step.
> 
> I think that would break symmetry:  you want choose(n, k) to equal
choose(n, n-k) when n is very close to an integer.  So I'd suggest the
replacement whenever R_IS_INT(n) is true.
> 
But choose() very deliberately ensures that k is integer, so choose(n, n-k) is
ill-defined for non-integer n.

    double r, k0 = k;
    k = R_forceint(k);
...
    if (fabs(k - k0) > 1e-7)
        MATHLIB_WARNING2(_("'k' (%.2f) must be integer, rounded to
%.0f"), k0, k);
  
> Duncan Murdoch
> 
>> -pd
>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at
pitt.edu> wrote:
>>> 
>>> This struck me as incorrect:
>>> 
>>>> choose(3.999999, 4)
>>> [1] 0.9999979
>>>> choose(3.9999999, 4)
>>> [1] 0
>>>> choose(4, 4)
>>> [1] 1
>>>> choose(4.0000001, 4)
>>> [1] 4
>>>> choose(4.000001, 4)
>>> [1] 1.000002
>>> 
>>> Should base::choose(n, k) check whether n is within machine
precision of k and return 1?
>>> 
>>> Thanks,
>>> Erik
>>> 
>>> ***
>>> sessionInfo()
>>> R version 3.6.0 beta (2019-04-15 r76395)
>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>> Running under: macOS High Sierra 10.13.6
>>> 
>>> 	[[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Duncan Murdoch

2020-Jan-14 16:02 UTC

head link

[Rd] [R] choose(n, k) as n approaches k

On 14/01/2020 10:50 a.m., peter dalgaard wrote:> 
> 
>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>>
>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>> Yep, that looks wrong (probably want to continue discussion over on
R-devel)
>>> I think the culprit is here (in src/nmath/choose.c)
>>>       if (k < k_small_max) {
>>>          int j;
>>>          if(n-k < k && n >= 0 && R_IS_INT(n))
k = n-k; /* <- Symmetry */
>>>          if (k <  0) return 0.;
>>>          if (k == 0) return 1.;
>>>          /* else: k >= 1 */
>>> if n is a near-integer, then k can become non-integer and negative.
In your case,
>>> n == 4 - 1e-7
>>> k == 4
>>> n - k == -1e-7 < 4
>>> n >= 0
>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>> so k gets set to
>>> n - k == -1e-7
>>> which is less than 0, so we return 0. However, as you point out, 1
would be more reasonable and in accordance with the limit as n -> 4, e.g.
>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>> [1] -9.289025e-11
>>> I guess that the fix could be as simple as replacing n by
R_forceint(n) in the k = n - k step.
>>
>> I think that would break symmetry:  you want choose(n, k) to equal
choose(n, n-k) when n is very close to an integer.  So I'd suggest the
replacement whenever R_IS_INT(n) is true.
>>
> 
> But choose() very deliberately ensures that k is integer, so choose(n, n-k)
is ill-defined for non-integer n.
That's only true if there's a big difference.  I'd be worried about 
cases where n and k are close to integers (within 1e-7).  In those 
cases, k is silently rounded to integer.  As I read your suggestion, n 
would only be rounded to integer if k > n-k.  I think both n and k 
should be rounded to integer in this near-integer situation, regardless 
of the value of k.

I believe that lchoose(n, k) already does this.

Duncan Murdoch
> 
>      double r, k0 = k;
>      k = R_forceint(k);
> ...
>      if (fabs(k - k0) > 1e-7)
>          MATHLIB_WARNING2(_("'k' (%.2f) must be integer,
rounded to %.0f"), k0, k);
>    
> 
>> Duncan Murdoch
>>
>>> -pd
>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT at
pitt.edu> wrote:
>>>>
>>>> This struck me as incorrect:
>>>>
>>>>> choose(3.999999, 4)
>>>> [1] 0.9999979
>>>>> choose(3.9999999, 4)
>>>> [1] 0
>>>>> choose(4, 4)
>>>> [1] 1
>>>>> choose(4.0000001, 4)
>>>> [1] 4
>>>>> choose(4.000001, 4)
>>>> [1] 1.000002
>>>>
>>>> Should base::choose(n, k) check whether n is within machine
precision of k and return 1?
>>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>> ***
>>>> sessionInfo()
>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>> Running under: macOS High Sierra 10.13.6
>>>>
>>>> 	[[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>

peter dalgaard

2020-Jan-14 18:20 UTC

head link

[Rd] [R] choose(n, k) as n approaches k

OK, I see what you mean. But in those cases, we don't get the catastrophic
failures from the

        if (k <  0) return 0.;
        if (k == 0) return 1.;
        /* else: k >= 1 */

part, because at that point k is sure to be integer, possibly after rounding. 

It is when n-k is approximately but not exactly zero and we should return 1,
that we either return 0 (negative case) or n (positive case; because the
n(n-1)(n-2)... product has at least one factor). In the other cases, we get 1 or
n(n-1)(n-2)...(n-k+1) which if n is near-integer gets rounded to produce an
integer, due to the

        return R_IS_INT(n) ? R_forceint(r) : r;

part.

-pd


> On 14 Jan 2020, at 17:02 , Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
> 
> On 14/01/2020 10:50 a.m., peter dalgaard wrote:
>>> On 14 Jan 2020, at 16:21 , Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>>> 
>>> On 14/01/2020 10:07 a.m., peter dalgaard wrote:
>>>> Yep, that looks wrong (probably want to continue discussion
over on R-devel)
>>>> I think the culprit is here (in src/nmath/choose.c)
>>>>      if (k < k_small_max) {
>>>>         int j;
>>>>         if(n-k < k && n >= 0 &&
R_IS_INT(n)) k = n-k; /* <- Symmetry */
>>>>         if (k <  0) return 0.;
>>>>         if (k == 0) return 1.;
>>>>         /* else: k >= 1 */
>>>> if n is a near-integer, then k can become non-integer and
negative. In your case,
>>>> n == 4 - 1e-7
>>>> k == 4
>>>> n - k == -1e-7 < 4
>>>> n >= 0
>>>> R_IS_INT(n) = TRUE (relative diff < 1e-7 is allowed)
>>>> so k gets set to
>>>> n - k == -1e-7
>>>> which is less than 0, so we return 0. However, as you point
out, 1 would be more reasonable and in accordance with the limit as n -> 4,
e.g.
>>>>> factorial(4 - 1e-10)/factorial(1e-10)/factorial(4) -1
>>>> [1] -9.289025e-11
>>>> I guess that the fix could be as simple as replacing n by
R_forceint(n) in the k = n - k step.
>>> 
>>> I think that would break symmetry:  you want choose(n, k) to equal
choose(n, n-k) when n is very close to an integer.  So I'd suggest the
replacement whenever R_IS_INT(n) is true.
>>> 
>> But choose() very deliberately ensures that k is integer, so choose(n,
n-k) is ill-defined for non-integer n.
> 
> That's only true if there's a big difference.  I'd be worried
about cases where n and k are close to integers (within 1e-7).  In those cases,
k is silently rounded to integer.  As I read your suggestion, n would only be
rounded to integer if k > n-k.  I think both n and k should be rounded to
integer in this near-integer situation, regardless of the value of k.
> 
> I believe that lchoose(n, k) already does this.
> 
> Duncan Murdoch
> 
>>     double r, k0 = k;
>>     k = R_forceint(k);
>> ...
>>     if (fabs(k - k0) > 1e-7)
>>         MATHLIB_WARNING2(_("'k' (%.2f) must be integer,
rounded to %.0f"), k0, k);
>>   
>>> Duncan Murdoch
>>> 
>>>> -pd
>>>>> On 14 Jan 2020, at 00:33 , Wright, Erik Scott <ESWRIGHT
at pitt.edu> wrote:
>>>>> 
>>>>> This struck me as incorrect:
>>>>> 
>>>>>> choose(3.999999, 4)
>>>>> [1] 0.9999979
>>>>>> choose(3.9999999, 4)
>>>>> [1] 0
>>>>>> choose(4, 4)
>>>>> [1] 1
>>>>>> choose(4.0000001, 4)
>>>>> [1] 4
>>>>>> choose(4.000001, 4)
>>>>> [1] 1.000002
>>>>> 
>>>>> Should base::choose(n, k) check whether n is within machine
precision of k and return 1?
>>>>> 
>>>>> Thanks,
>>>>> Erik
>>>>> 
>>>>> ***
>>>>> sessionInfo()
>>>>> R version 3.6.0 beta (2019-04-15 r76395)
>>>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>>>> Running under: macOS High Sierra 10.13.6
>>>>> 
>>>>> 	[[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Reasonably Related Threads

Search for more maybe matching threads

R devel - Jan 2020 - [R] choose(n, k) as n approaches k

[Rd] [R] choose(n, k) as n approaches k

[Rd] [R] choose(n, k) as n approaches k

[Rd] [R] choose(n, k) as n approaches k

Reasonably Related Threads