Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat EL4 Submission from: (NULL) (130.127.112.89) pbinom(any negative value, size, prob) should be zero. But I got the following results. I mean, if a negative value is close to zero, then pbinom() calculate pbinom(0, size, prob). dbinom() also behaves similarly.> pbinom( -2.220446e-22, 3,.1)[1] 0.729> pbinom( -2.220446e-8, 3,.1)[1] 0.729> pbinom( -2.220446e-7, 3,.1)[1] 0
>>>>> "cspark" == cspark <cspark at clemson.edu> >>>>> on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes:cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat cspark> EL4 Submission from: (NULL) (130.127.112.89) cspark> pbinom(any negative value, size, prob) should be cspark> zero. But I got the following results. I mean, if cspark> a negative value is close to zero, then pbinom() cspark> calculate pbinom(0, size, prob). >> pbinom( -2.220446e-22, 3,.1) [1] 0.729 >> pbinom( -2.220446e-8, 3,.1) [1] 0.729 >> pbinom( -2.220446e-7, 3,.1) [1] 0 Yes, all the [dp]* functions which are discrete with mass on the integers only, do *round* their 'x' to integers. I could well argue that the current behavior is *not* a bug, since we do treat "x close to integer" as integer, and hence pbinom(eps, size, prob) with eps "very close to 0" should give pbinom(0, size, prob) as it now does. However, for esthetical reasons, I agree that we should test for "< 0" first (and give 0 then) and only round otherwise. I'll change this for R-devel (i.e. R 2.3.0 in about a month). cspark> dbinom() also behaves similarly. yes, similarly, but differently. I have changed it (for R-devel) as well, to behave the same as others d*() , e.g., dpois(), dnbinom() do. Martin Maechler, ETH Zurich
>>>>> "Duncan" == Duncan Murdoch <murdoch at stats.uwo.ca> >>>>> on Wed, 22 Mar 2006 07:40:11 -0500 writes:Duncan> On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch Duncan> wrote: >>>>>>> "cspark" == cspark <cspark at clemson.edu> on Wed, 22 >>>>>>> Mar 2006 05:52:13 +0100 (CET) writes: >> cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat cspark> EL4 Submission from: (NULL) (130.127.112.89) >> cspark> pbinom(any negative value, size, prob) should be cspark> zero. But I got the following results. I mean, if cspark> a negative value is close to zero, then pbinom() cspark> calculate pbinom(0, size, prob). >> >> pbinom( -2.220446e-22, 3,.1) [1] 0.729 >> pbinom( >> -2.220446e-8, 3,.1) [1] 0.729 >> pbinom( -2.220446e-7, >> 3,.1) [1] 0 >> >> Yes, all the [dp]* functions which are discrete with mass >> on the integers only, do *round* their 'x' to integers. >> >> I could well argue that the current behavior is *not* a >> bug, since we do treat "x close to integer" as integer, >> and hence pbinom(eps, size, prob) with eps "very close to >> 0" should give pbinom(0, size, prob) as it now does. >> >> However, for esthetical reasons, I agree that we should >> test for "< 0" first (and give 0 then) and only round >> otherwise. I'll change this for R-devel (i.e. R 2.3.0 in >> about a month). >> cspark> dbinom() also behaves similarly. >> yes, similarly, but differently. I have changed it (for >> R-devel) as well, to behave the same as others d*() , >> e.g., dpois(), dnbinom() do. Duncan> Martin, your description makes it sound as though Duncan> dbinom(0.3, size, prob) would give the same answer Duncan> as dbinom(0, size, prob), whereas it actually gives Duncan> 0 with a warning, as documented in ?dbinom. The d* Duncan> functions only round near-integers to integers, Duncan> where it looks as though near means within 1E-7. That's correct. Above, I did not describe what happens for the d*() functions but said that dbinom() behaves differently than pbinom and that I have changed dbinom() to behave similarly to dnbinom(), dgeom(),.... Duncan> The p* functions round near integers to integers, Duncan> and truncate others to the integer below. Duncan> I suppose the reason for this behaviour is to Duncan> protect against rounding error giving nonsense Duncan> results; I'm not sure that's a great idea, I agree that it may not seem such a great idea; but that has been discussed and decided (IIRC against my preference) quite a while ago, and I don't think it is worthwhile to rediscuss such relatively fundamental behavior every few years.. Duncan> but if we do it, should we really be handling 0 Duncan> differently? yes: - only around 0, small absolute deviations are large relative deviations - 0 is the left border of the function's domain, where one would expect strict mathematical behavior more strongly. Martin Maechler
Duncan Murdoch <murdoch at stats.uwo.ca> writes:> On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote: > >>>>>> "cspark" == cspark <cspark at clemson.edu> > >>>>>> on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes: > > > > cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat > > cspark> EL4 Submission from: (NULL) (130.127.112.89) > > > > > > > > cspark> pbinom(any negative value, size, prob) should be > > cspark> zero. But I got the following results. I mean, if > > cspark> a negative value is close to zero, then pbinom() > > cspark> calculate pbinom(0, size, prob). > > > > >> pbinom( -2.220446e-22, 3,.1) > > [1] 0.729 > > >> pbinom( -2.220446e-8, 3,.1) > > [1] 0.729 > > >> pbinom( -2.220446e-7, 3,.1) > > [1] 0 > > > > Yes, all the [dp]* functions which are discrete with mass on the > > integers only, do *round* their 'x' to integers. > > > > I could well argue that the current behavior is *not* a bug, > > since we do treat "x close to integer" as integer, and hence > > pbinom(eps, size, prob) with eps "very close to 0" should give > > pbinom(0, size, prob) > > as it now does. > > > > However, for esthetical reasons, > > I agree that we should test for "< 0" first (and give 0 then) and only > > round otherwise. I'll change this for R-devel (i.e. R 2.3.0 in > > about a month). > > > > cspark> dbinom() also behaves similarly. > > > > yes, similarly, but differently. > > I have changed it (for R-devel) as well, to behave the same as > > others d*() , e.g., dpois(), dnbinom() do. > > Martin, your description makes it sound as though dbinom(0.3, size, > prob) would give the same answer as dbinom(0, size, prob), whereas it > actually gives 0 with a warning, as documented in ?dbinom. The d* > functions only round near-integers to integers, where it looks as though > near means within 1E-7. The p* functions round near integers to > integers, and truncate others to the integer below.Well, the p-functions are constant on the intervals between integers... (Or, did you refer to the lack of a warning? One point could be that cumulative p.d.f.s extends naturally to non-integers, whereas densities don't really extend, since they are defined with respect to counting measure on the integers.)> I suppose the reason for this behaviour is to protect against rounding > error giving nonsense results; I'm not sure that's a great idea, but if > we do it, should we really be handling 0 differently?Most of these round-near-integer issues were spurred by real programming problems. It is somewhat hard to come up with a problem that leads you generate a binomial variate value with "floating point noise", but I'm quite sure that we'll be reminded if we try to change it... (One potential issue is back-calculation to counts from relative frequencies). -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On 3/22/2006 10:08 AM, Peter Dalgaard wrote:> Duncan Murdoch <murdoch at stats.uwo.ca> writes: > >> On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote: >> >>>>>> "cspark" == cspark <cspark at clemson.edu> >> >>>>>> on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes: >> > >> > cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat >> > cspark> EL4 Submission from: (NULL) (130.127.112.89) >> > >> > >> > >> > cspark> pbinom(any negative value, size, prob) should be >> > cspark> zero. But I got the following results. I mean, if >> > cspark> a negative value is close to zero, then pbinom() >> > cspark> calculate pbinom(0, size, prob). >> > >> > >> pbinom( -2.220446e-22, 3,.1) >> > [1] 0.729 >> > >> pbinom( -2.220446e-8, 3,.1) >> > [1] 0.729 >> > >> pbinom( -2.220446e-7, 3,.1) >> > [1] 0 >> > >> > Yes, all the [dp]* functions which are discrete with mass on the >> > integers only, do *round* their 'x' to integers. >> > >> > I could well argue that the current behavior is *not* a bug, >> > since we do treat "x close to integer" as integer, and hence >> > pbinom(eps, size, prob) with eps "very close to 0" should give >> > pbinom(0, size, prob) >> > as it now does. >> > >> > However, for esthetical reasons, >> > I agree that we should test for "< 0" first (and give 0 then) and only >> > round otherwise. I'll change this for R-devel (i.e. R 2.3.0 in >> > about a month). >> > >> > cspark> dbinom() also behaves similarly. >> > >> > yes, similarly, but differently. >> > I have changed it (for R-devel) as well, to behave the same as >> > others d*() , e.g., dpois(), dnbinom() do. >> >> Martin, your description makes it sound as though dbinom(0.3, size, >> prob) would give the same answer as dbinom(0, size, prob), whereas it >> actually gives 0 with a warning, as documented in ?dbinom. The d* >> functions only round near-integers to integers, where it looks as though >> near means within 1E-7. The p* functions round near integers to >> integers, and truncate others to the integer below. > > Well, the p-functions are constant on the intervals between > integers...Not quite: they're constant on intervals (n - 1e-7, n+1 - 1e-7), for integers n. Since Martin's change, this is not true for n=0. (Or, did you refer to the lack of a warning? One point> could be that cumulative p.d.f.s extends naturally to non-integers, > whereas densities don't really extend, since they are defined with > respect to counting measure on the integers.)I wasn't complaining about the behaviour here, I was just clarifying Martin's description of it, when he said that "all the [dp]* functions which are discrete with mass on the integers only, do *round* their 'x' to integers".> >> I suppose the reason for this behaviour is to protect against rounding >> error giving nonsense results; I'm not sure that's a great idea, but if >> we do it, should we really be handling 0 differently? > > Most of these round-near-integer issues were spurred by real > programming problems. It is somewhat hard to come up with a problem > that leads you generate a binomial variate value with "floating point > noise", but I'm quite sure that we'll be reminded if we try to change > it... (One potential issue is back-calculation to counts from relative > frequencies).Again, I wasn't suggesting we change the general +/- 1E-7 behaviour (though it should be documented to avoid bug reports like this one), but I'm worried about having zero as a special case. This will break relations such as dbinom(x, n, 0.5) == dbinom(n-x, n, 0.5) (in the case where x is n+epsilon or -epsilon, for small enough epsilon). Is it really desirable to break the symmetry like this? Duncan Murdoch