On 1/20/20 4:26 AM, Martin Maechler wrote:> Coming late here -- after enjoying a proper weekend ;-) -- > I have been agreeing (with Spencer, IIUC) on this for a long > time (~ 3 yrs, or more?), namely that I've come to see it as a > "design bug" that rpois() {and similar} must return return typeof() "integer". > > More strongly, I'm actually pretty convinced they should return > (integer-valued) double instead of NA_integer_ and for that > reason should always return double: > Even if we have (hopefully) a native 64bit integer in R, > 2^64 is still teeny tiny compared .Machine$double.max > > (and then maybe we'd have .Machine$longdouble.max which would > be considerably larger than double.max unless on Windows, where > the wise men at Microsoft decided to keep their workload simple > by defining "long double := double" - as 'long double' > unfortunately is not well defined by C standards) > > Martin >Martin if you are in favor, then certainly no objection from me! ;-) So now what about other discrete distributions e.g. could a similar enhancement apply here? > rgeom(10L, 1e-10) ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA ?[7]???????? NA???????? NA???????? NA???????? NA Warning message: In rgeom(10L, 1e-10) : NAs produced
>>>>> Benjamin Tyner >>>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes:> On 1/20/20 4:26 AM, Martin Maechler wrote: >> Coming late here -- after enjoying a proper weekend ;-) -- >> I have been agreeing (with Spencer, IIUC) on this for a long >> time (~ 3 yrs, or more?), namely that I've come to see it as a >> "design bug" that rpois() {and similar} must return return typeof() "integer". >> >> More strongly, I'm actually pretty convinced they should return >> (integer-valued) double instead of NA_integer_ and for that >> reason should always return double: >> Even if we have (hopefully) a native 64bit integer in R, >> 2^64 is still teeny tiny compared .Machine$double.max >> >> (and then maybe we'd have .Machine$longdouble.max which would >> be considerably larger than double.max unless on Windows, where >> the wise men at Microsoft decided to keep their workload simple >> by defining "long double := double" - as 'long double' >> unfortunately is not well defined by C standards) >> >> Martin >> > Martin if you are in favor, then certainly no objection from me! ;-) > So now what about other discrete distributions e.g. could a similar > enhancement apply here? >> rgeom(10L, 1e-10) > ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA > ?[7]???????? NA???????? NA???????? NA???????? NA > Warning message: > In rgeom(10L, 1e-10) : NAs produced yes, of course there are several such distributions. It's really something that should be discussed (possibly not here, .. but then I've started it here ...). The NEWS for R 3.0.0 contain (in NEW FEATURES) : * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), rsignrank() and rwilcox() now return integer (not double) vectors. This halves the storage requirements for large simulations. and what I've been suggesting is to revert this change (svn rev r60225-6) which was purposefully and diligently done by a fellow R core member, so indeed must be debatable. Martin
Ugh, sounds like competing priorities. * maintain type consistency * minimize storage (= current version, since 3.0.0) * maximize utility for large lambda (= proposed change) * keep user interface, and code, simple (e.g., it would be easy enough to add a switch that provided user control of int vs double return value) * backward compatibility On 2020-01-20 12:33 p.m., Martin Maechler wrote:>>>>>> Benjamin Tyner >>>>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes: > > > On 1/20/20 4:26 AM, Martin Maechler wrote: > >> Coming late here -- after enjoying a proper weekend ;-) -- > >> I have been agreeing (with Spencer, IIUC) on this for a long > >> time (~ 3 yrs, or more?), namely that I've come to see it as a > >> "design bug" that rpois() {and similar} must return return typeof() "integer". > >> > >> More strongly, I'm actually pretty convinced they should return > >> (integer-valued) double instead of NA_integer_ and for that > >> reason should always return double: > >> Even if we have (hopefully) a native 64bit integer in R, > >> 2^64 is still teeny tiny compared .Machine$double.max > >> > >> (and then maybe we'd have .Machine$longdouble.max which would > >> be considerably larger than double.max unless on Windows, where > >> the wise men at Microsoft decided to keep their workload simple > >> by defining "long double := double" - as 'long double' > >> unfortunately is not well defined by C standards) > >> > >> Martin > >> > > Martin if you are in favor, then certainly no objection from me! ;-) > > > So now what about other discrete distributions e.g. could a similar > > enhancement apply here? > > > >> rgeom(10L, 1e-10) > > ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA > > ?[7]???????? NA???????? NA???????? NA???????? NA > > Warning message: > > In rgeom(10L, 1e-10) : NAs produced > > yes, of course there are several such distributions. > > It's really something that should be discussed (possibly not > here, .. but then I've started it here ...). > > The NEWS for R 3.0.0 contain (in NEW FEATURES) : > > * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), > rsignrank() and rwilcox() now return integer (not double) > vectors. This halves the storage requirements for large > simulations. > > and what I've been suggesting is to revert this change > (svn rev r60225-6) which was purposefully and diligently done by > a fellow R core member, so indeed must be debatable. > > Martin > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
On 1/20/20 12:33 PM, Martin Maechler wrote:> > It's really something that should be discussed (possibly not > here, .. but then I've started it here ...). > > The NEWS for R 3.0.0 contain (in NEW FEATURES) : > > * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), > rsignrank() and rwilcox() now return integer (not double) > vectors. This halves the storage requirements for large > simulations. > > and what I've been suggesting is to revert this change > (svn rev r60225-6) which was purposefully and diligently done by > a fellow R core member, so indeed must be debatable. > > MartinFor the record, I don't personally objects to the change here (as my philosophy tends toward treating most warnings as errors anyway) but for the sake of other useRs who may get bitten, perhaps we should be more explicit that backwards-compatibility won't be preserved under certain use patterns, for example: # works (with warning) in R 3.6.2 but fails (with error) in R-devel: vapply(list(1e9, 1e10), ?????? function(lambda) { ????????? rpois(1L, lambda) ?????? }, ?????? FUN.VALUE = integer(1L) ?????? ) # in R-devel, a little extra work to achieve a warning as before: vapply(list(1e9, 1e10), ?????? function(lambda) { ????????? tmp <- rpois(1L, lambda) ????????? if (!is.integer(tmp)) { ???????????? warning("NAs produced") ???????????? tmp <- NA_integer_ ????????? } ????????? tmp ?????? }, ?????? FUN.VALUE = integer(1L) ?????? ) (and yes I realize that rpois() vectorizes on lambda, so vapply is re-inventing the wheel in this toy example, but there could be (?) a justified use for it in more complicated simulations).