thr3ads.net - R devel - [Rd] pbinom( ) function (PR#8700) [Mar 2006]

If this information is useful, please help other people find it:
Share via:

cspark at clemson.edu

2006-Mar-22 04:52 UTC

[Rd] pbinom( ) function (PR#8700)

Full_Name: Chanseok Park
Version: R 2.2.1
OS: RedHat EL4
Submission from: (NULL) (130.127.112.89)



pbinom(any negative value, size, prob) should be zero. 
But I got the following results. 
I mean, if a negative value is close to zero, then pbinom() calculate
pbinom(0, size, prob). dbinom() also behaves similarly. 

> pbinom( -2.220446e-22, 3,.1)[1] 0.729
> pbinom( -2.220446e-8, 3,.1)[1] 0.729
> pbinom( -2.220446e-7, 3,.1)[1] 0

maechler at stat.math.ethz.ch

2006-Mar-22 08:52 UTC

head link

[Rd] pbinom( ) function (PR#8700)

>>>>> "cspark" == cspark  <cspark at clemson.edu>
>>>>>     on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes:
    cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
    cspark> EL4 Submission from: (NULL) (130.127.112.89)

    cspark> pbinom(any negative value, size, prob) should be
    cspark> zero.  But I got the following results.  I mean, if
    cspark> a negative value is close to zero, then pbinom()
    cspark> calculate pbinom(0, size, prob). 

    >> pbinom( -2.220446e-22, 3,.1)
    [1] 0.729
    >> pbinom( -2.220446e-8, 3,.1)
    [1] 0.729
    >> pbinom( -2.220446e-7, 3,.1)
    [1] 0

Yes, all the [dp]* functions which are discrete with mass on the
integers only, do *round* their 'x' to integers.

I could well argue that the current behavior is *not* a bug,
since we do treat "x close to integer" as integer, and hence 
   pbinom(eps, size, prob)  with  eps "very close to 0" should give
   pbinom(0,   size, prob)
as it now does.

However, for esthetical reasons, 
I agree that we should test for "< 0" first (and give 0 then) and
only
round otherwise.  I'll change this for R-devel (i.e. R 2.3.0 in
about a month).

    cspark> dbinom() also behaves similarly.

yes, similarly, but differently.
I have changed it (for R-devel) as well, to behave the same as
others d*() , e.g., dpois(), dnbinom() do.

Martin Maechler, ETH Zurich

maechler at stat.math.ethz.ch

2006-Mar-22 14:34 UTC

head link

[Rd] pbinom( ) function (PR#8700)

>>>>> "Duncan" == Duncan Murdoch <murdoch at
stats.uwo.ca>
>>>>>     on Wed, 22 Mar 2006 07:40:11 -0500 writes:
    Duncan> On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch
    Duncan> wrote:
    >>>>>>> "cspark" == cspark <cspark at
clemson.edu> on Wed, 22
    >>>>>>> Mar 2006 05:52:13 +0100 (CET) writes:
    >>
    cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
    cspark> EL4 Submission from: (NULL) (130.127.112.89)
    >>
    cspark> pbinom(any negative value, size, prob) should be
    cspark> zero.  But I got the following results.  I mean, if
    cspark> a negative value is close to zero, then pbinom()
    cspark> calculate pbinom(0, size, prob).
    >>  >> pbinom( -2.220446e-22, 3,.1) [1] 0.729 >> pbinom(
    >> -2.220446e-8, 3,.1) [1] 0.729 >> pbinom( -2.220446e-7,
    >> 3,.1) [1] 0
    >> 
    >> Yes, all the [dp]* functions which are discrete with mass
    >> on the integers only, do *round* their 'x' to integers.
    >> 
    >> I could well argue that the current behavior is *not* a
    >> bug, since we do treat "x close to integer" as integer,
    >> and hence pbinom(eps, size, prob) with eps "very close to
    >> 0" should give pbinom(0, size, prob) as it now does.
    >> 
    >> However, for esthetical reasons, I agree that we should
    >> test for "< 0" first (and give 0 then) and only round
    >> otherwise.  I'll change this for R-devel (i.e. R 2.3.0 in
    >> about a month).
    >> 
    cspark> dbinom() also behaves similarly.
    >>  yes, similarly, but differently.  I have changed it (for
    >> R-devel) as well, to behave the same as others d*() ,
    >> e.g., dpois(), dnbinom() do.

    Duncan> Martin, your description makes it sound as though
    Duncan> dbinom(0.3, size, prob) would give the same answer
    Duncan> as dbinom(0, size, prob), whereas it actually gives
    Duncan> 0 with a warning, as documented in ?dbinom.  The d*
    Duncan> functions only round near-integers to integers,
    Duncan> where it looks as though near means within 1E-7.

That's correct. Above, I did not describe what happens for the d*()
functions but said that dbinom() behaves differently than
pbinom and that I have changed dbinom() to behave similarly to
dnbinom(), dgeom(),....

    Duncan> The p* functions round near integers to integers,
    Duncan> and truncate others to the integer below.

    Duncan> I suppose the reason for this behaviour is to
    Duncan> protect against rounding error giving nonsense
    Duncan> results; I'm not sure that's a great idea, 

I agree that it may not seem such a great idea; but that has
been discussed and decided (IIRC against my preference) quite a
while ago, and I don't think it is worthwhile to rediscuss such
relatively fundamental behavior every few years..

    Duncan> but if we do it, should we really be handling 0
    Duncan> differently?

yes:
- only around 0, small absolute deviations are large relative deviations

- 0 is the left border of the function's domain, where one would expect
  strict mathematical behavior more strongly.

Martin Maechler

p.dalgaard at biostat.ku.dk

2006-Mar-22 15:08 UTC

head link

[Rd] pbinom( ) function (PR#8700)

Duncan Murdoch <murdoch at stats.uwo.ca> writes:
> On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote:
> >>>>>> "cspark" == cspark  <cspark at
clemson.edu>
> >>>>>>     on Wed, 22 Mar 2006 05:52:13 +0100 (CET)
writes:
> > 
> >     cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
> >     cspark> EL4 Submission from: (NULL) (130.127.112.89)
> > 
> > 
> > 
> >     cspark> pbinom(any negative value, size, prob) should be
> >     cspark> zero.  But I got the following results.  I mean, if
> >     cspark> a negative value is close to zero, then pbinom()
> >     cspark> calculate pbinom(0, size, prob). 
> > 
> >     >> pbinom( -2.220446e-22, 3,.1)
> >     [1] 0.729
> >     >> pbinom( -2.220446e-8, 3,.1)
> >     [1] 0.729
> >     >> pbinom( -2.220446e-7, 3,.1)
> >     [1] 0
> > 
> > Yes, all the [dp]* functions which are discrete with mass on the
> > integers only, do *round* their 'x' to integers.
> > 
> > I could well argue that the current behavior is *not* a bug,
> > since we do treat "x close to integer" as integer, and hence
> >    pbinom(eps, size, prob)  with  eps "very close to 0"
should give
> >    pbinom(0,   size, prob)
> > as it now does.
> > 
> > However, for esthetical reasons, 
> > I agree that we should test for "< 0" first (and give 0
then) and only
> > round otherwise.  I'll change this for R-devel (i.e. R 2.3.0 in
> > about a month).
> > 
> >     cspark> dbinom() also behaves similarly.
> > 
> > yes, similarly, but differently.
> > I have changed it (for R-devel) as well, to behave the same as
> > others d*() , e.g., dpois(), dnbinom() do.
> 
> Martin, your description makes it sound as though dbinom(0.3, size, 
> prob) would give the same answer as dbinom(0, size, prob), whereas it 
> actually gives 0 with a warning, as documented in ?dbinom.  The d* 
> functions only round near-integers to integers, where it looks as though 
> near means within 1E-7.  The p* functions round near integers to 
> integers, and truncate others to the integer below.
Well, the p-functions are constant on the intervals between
integers... (Or, did you refer to the lack of a warning? One point
could be that cumulative p.d.f.s extends naturally to non-integers,
whereas densities don't really extend, since they are defined with
respect to counting measure on the integers.)
 > I suppose the reason for this behaviour is to protect against rounding 
> error giving nonsense results; I'm not sure that's a great idea,
but if
> we do it, should we really be handling 0 differently?
Most of these round-near-integer issues were spurred by real
programming problems. It is somewhat hard to come up with a problem
that leads you generate a binomial variate value with "floating point
noise", but I'm quite sure that we'll be reminded if we try to
change
it... (One potential issue is back-calculation to counts from relative
frequencies).


-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

murdoch at stats.uwo.ca

2006-Mar-22 16:39 UTC

head link

[Rd] pbinom( ) function (PR#8700)

On 3/22/2006 10:08 AM, Peter Dalgaard wrote:> Duncan Murdoch <murdoch at stats.uwo.ca> writes:
> 
>> On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote:
>> >>>>>> "cspark" == cspark  <cspark at
clemson.edu>
>> >>>>>>     on Wed, 22 Mar 2006 05:52:13 +0100 (CET)
writes:
>> > 
>> >     cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS:
RedHat
>> >     cspark> EL4 Submission from: (NULL) (130.127.112.89)
>> > 
>> > 
>> > 
>> >     cspark> pbinom(any negative value, size, prob) should be
>> >     cspark> zero.  But I got the following results.  I mean, if
>> >     cspark> a negative value is close to zero, then pbinom()
>> >     cspark> calculate pbinom(0, size, prob). 
>> > 
>> >     >> pbinom( -2.220446e-22, 3,.1)
>> >     [1] 0.729
>> >     >> pbinom( -2.220446e-8, 3,.1)
>> >     [1] 0.729
>> >     >> pbinom( -2.220446e-7, 3,.1)
>> >     [1] 0
>> > 
>> > Yes, all the [dp]* functions which are discrete with mass on the
>> > integers only, do *round* their 'x' to integers.
>> > 
>> > I could well argue that the current behavior is *not* a bug,
>> > since we do treat "x close to integer" as integer, and
hence
>> >    pbinom(eps, size, prob)  with  eps "very close to 0"
should give
>> >    pbinom(0,   size, prob)
>> > as it now does.
>> > 
>> > However, for esthetical reasons, 
>> > I agree that we should test for "< 0" first (and give
0 then) and only
>> > round otherwise.  I'll change this for R-devel (i.e. R 2.3.0
in
>> > about a month).
>> > 
>> >     cspark> dbinom() also behaves similarly.
>> > 
>> > yes, similarly, but differently.
>> > I have changed it (for R-devel) as well, to behave the same as
>> > others d*() , e.g., dpois(), dnbinom() do.
>> 
>> Martin, your description makes it sound as though dbinom(0.3, size, 
>> prob) would give the same answer as dbinom(0, size, prob), whereas it 
>> actually gives 0 with a warning, as documented in ?dbinom.  The d* 
>> functions only round near-integers to integers, where it looks as
though
>> near means within 1E-7.  The p* functions round near integers to 
>> integers, and truncate others to the integer below.
> 
> Well, the p-functions are constant on the intervals between
> integers... 
Not quite:  they're constant on intervals (n - 1e-7, n+1 - 1e-7), for 
integers n.  Since Martin's change, this is not true for n=0.

(Or, did you refer to the lack of a warning? One point> could be that cumulative p.d.f.s extends naturally to non-integers,
> whereas densities don't really extend, since they are defined with
> respect to counting measure on the integers.)
I wasn't complaining about the behaviour here, I was just clarifying 
Martin's description of it, when he said that "all the [dp]* functions 
which are discrete with mass on the integers only, do *round* their 'x' 
to integers".
>  
>> I suppose the reason for this behaviour is to protect against rounding 
>> error giving nonsense results; I'm not sure that's a great
idea, but if
>> we do it, should we really be handling 0 differently?
> 
> Most of these round-near-integer issues were spurred by real
> programming problems. It is somewhat hard to come up with a problem
> that leads you generate a binomial variate value with "floating point
> noise", but I'm quite sure that we'll be reminded if we try to
change
> it... (One potential issue is back-calculation to counts from relative
> frequencies).
Again, I wasn't suggesting we change the general +/- 1E-7 behaviour 
(though it should be documented to avoid bug reports like this one), but 
I'm worried about having zero as a special case.  This will break 
relations such as

  dbinom(x, n, 0.5) == dbinom(n-x, n, 0.5)

(in the case where x is n+epsilon or -epsilon, for small enough 
epsilon).  Is it really desirable to break the symmetry like this?

Duncan Murdoch

Apparently Analagous Threads

Search for more maybe matching threads

R devel - Mar 2006 - pbinom( ) function (PR#8700)

[Rd] pbinom( ) function (PR#8700)

[Rd] pbinom( ) function (PR#8700)

[Rd] pbinom( ) function (PR#8700)

[Rd] pbinom( ) function (PR#8700)

[Rd] pbinom( ) function (PR#8700)

Apparently Analagous Threads