Ugh, sounds like competing priorities. * maintain type consistency * minimize storage (= current version, since 3.0.0) * maximize utility for large lambda (= proposed change) * keep user interface, and code, simple (e.g., it would be easy enough to add a switch that provided user control of int vs double return value) * backward compatibility On 2020-01-20 12:33 p.m., Martin Maechler wrote:>>>>>> Benjamin Tyner >>>>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes: > > > On 1/20/20 4:26 AM, Martin Maechler wrote: > >> Coming late here -- after enjoying a proper weekend ;-) -- > >> I have been agreeing (with Spencer, IIUC) on this for a long > >> time (~ 3 yrs, or more?), namely that I've come to see it as a > >> "design bug" that rpois() {and similar} must return return typeof() "integer". > >> > >> More strongly, I'm actually pretty convinced they should return > >> (integer-valued) double instead of NA_integer_ and for that > >> reason should always return double: > >> Even if we have (hopefully) a native 64bit integer in R, > >> 2^64 is still teeny tiny compared .Machine$double.max > >> > >> (and then maybe we'd have .Machine$longdouble.max which would > >> be considerably larger than double.max unless on Windows, where > >> the wise men at Microsoft decided to keep their workload simple > >> by defining "long double := double" - as 'long double' > >> unfortunately is not well defined by C standards) > >> > >> Martin > >> > > Martin if you are in favor, then certainly no objection from me! ;-) > > > So now what about other discrete distributions e.g. could a similar > > enhancement apply here? > > > >> rgeom(10L, 1e-10) > > ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA > > ?[7]???????? NA???????? NA???????? NA???????? NA > > Warning message: > > In rgeom(10L, 1e-10) : NAs produced > > yes, of course there are several such distributions. > > It's really something that should be discussed (possibly not > here, .. but then I've started it here ...). > > The NEWS for R 3.0.0 contain (in NEW FEATURES) : > > * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), > rsignrank() and rwilcox() now return integer (not double) > vectors. This halves the storage requirements for large > simulations. > > and what I've been suggesting is to revert this change > (svn rev r60225-6) which was purposefully and diligently done by > a fellow R core member, so indeed must be debatable. > > Martin > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
>>>>> Ben Bolker >>>>> on Mon, 20 Jan 2020 12:54:52 -0500 writes:> Ugh, sounds like competing priorities. indeed. > * maintain type consistency > * minimize storage (= current version, since 3.0.0) > * maximize utility for large lambda (= proposed change) > * keep user interface, and code, simple (e.g., it would be easy enough > to add a switch that provided user control of int vs double return value) > * backward compatibility Last night, it came to my mind that we should do what we have been doing in quite a few places in R, the last couple of years: Return integer when possible, and switch to return double when integers don't fit. We've been doing so even for 1:N (well, now with additional ALTREP wrapper), seq(), and even the fundamental length() function. So I sat down and implemented it .. and it seemed to work perfectly: Returning the same random numbers as now, but switching to use double (instead of returning NAs) when the values are too large. I'll probably commit that to R-devel quite soonish. Martin > On 2020-01-20 12:33 p.m., Martin Maechler wrote: >>>>>>> Benjamin Tyner >>>>>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes: >> >> > On 1/20/20 4:26 AM, Martin Maechler wrote: >> >> Coming late here -- after enjoying a proper weekend ;-) -- >> >> I have been agreeing (with Spencer, IIUC) on this for a long >> >> time (~ 3 yrs, or more?), namely that I've come to see it as a >> >> "design bug" that rpois() {and similar} must return return typeof() "integer". >> >> >> >> More strongly, I'm actually pretty convinced they should return >> >> (integer-valued) double instead of NA_integer_ and for that >> >> reason should always return double: >> >> Even if we have (hopefully) a native 64bit integer in R, >> >> 2^64 is still teeny tiny compared .Machine$double.max >> >> >> >> (and then maybe we'd have .Machine$longdouble.max which would >> >> be considerably larger than double.max unless on Windows, where >> >> the wise men at Microsoft decided to keep their workload simple >> >> by defining "long double := double" - as 'long double' >> >> unfortunately is not well defined by C standards) >> >> >> >> Martin >> >> >> > Martin if you are in favor, then certainly no objection from me! ;-) >> >> > So now what about other discrete distributions e.g. could a similar >> > enhancement apply here? >> >> >> >> rgeom(10L, 1e-10) >> > ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA >> > ?[7]???????? NA???????? NA???????? NA???????? NA >> > Warning message: >> > In rgeom(10L, 1e-10) : NAs produced >> >> yes, of course there are several such distributions. >> >> It's really something that should be discussed (possibly not >> here, .. but then I've started it here ...). >> >> The NEWS for R 3.0.0 contain (in NEW FEATURES) : >> >> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), >> rsignrank() and rwilcox() now return integer (not double) >> vectors. This halves the storage requirements for large >> simulations. >> >> and what I've been suggesting is to revert this change >> (svn rev r60225-6) which was purposefully and diligently done by >> a fellow R core member, so indeed must be debatable. >> >> Martin >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> Martin Maechler >>>>> on Tue, 21 Jan 2020 09:25:19 +0100 writes:>>>>> Ben Bolker >>>>> on Mon, 20 Jan 2020 12:54:52 -0500 writes:>> Ugh, sounds like competing priorities. > indeed. >> * maintain type consistency >> * minimize storage (= current version, since 3.0.0) >> * maximize utility for large lambda (= proposed change) >> * keep user interface, and code, simple (e.g., it would be easy enough >> to add a switch that provided user control of int vs double return value) >> * backward compatibility > Last night, it came to my mind that we should do what we have > been doing in quite a few places in R, the last couple of years: > Return integer when possible, and switch to return double when > integers don't fit. > We've been doing so even for 1:N (well, now with additional ALTREP wrapper), > seq(), and even the fundamental length() function. > So I sat down and implemented it .. and it seemed to work > perfectly: Returning the same random numbers as now, but > switching to use double (instead of returning NAs) when the > values are too large. > I'll probably commit that to R-devel quite soonish. > Martin Committed in svn rev 77690; this is really very advantageous, as in some cases / applications or even just limit cases, you'd easily get into overflow sitations. The new R 4.0.0 behavior is IMO "the best of" being memory efficient (integer storage) in most cases (back compatible to R 3.x.x) and returning desired random numbers in large cases (compatible to R <= 2.x.x). Martin >> On 2020-01-20 12:33 p.m., Martin Maechler wrote: >>>>>>>> Benjamin Tyner >>>>>>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes: >>> >>> > On 1/20/20 4:26 AM, Martin Maechler wrote: >>> >> Coming late here -- after enjoying a proper weekend ;-) -- >>> >> I have been agreeing (with Spencer, IIUC) on this for a long >>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a >>> >> "design bug" that rpois() {and similar} must return return typeof() "integer". >>> >> >>> >> More strongly, I'm actually pretty convinced they should return >>> >> (integer-valued) double instead of NA_integer_ and for that >>> >> reason should always return double: >>> >> Even if we have (hopefully) a native 64bit integer in R, >>> >> 2^64 is still teeny tiny compared .Machine$double.max >>> >> >>> >> (and then maybe we'd have .Machine$longdouble.max which would >>> >> be considerably larger than double.max unless on Windows, where >>> >> the wise men at Microsoft decided to keep their workload simple >>> >> by defining "long double := double" - as 'long double' >>> >> unfortunately is not well defined by C standards) >>> >> >>> >> Martin >>> >> >>> > Martin if you are in favor, then certainly no objection from me! ;-) >>> >>> > So now what about other discrete distributions e.g. could a similar >>> > enhancement apply here? >>> >>> >>> >> rgeom(10L, 1e-10) >>> > ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA >>> > ?[7]???????? NA???????? NA???????? NA???????? NA >>> > Warning message: >>> > In rgeom(10L, 1e-10) : NAs produced >>> >>> yes, of course there are several such distributions. >>> >>> It's really something that should be discussed (possibly not >>> here, .. but then I've started it here ...). >>> >>> The NEWS for R 3.0.0 contain (in NEW FEATURES) : >>> >>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), >>> rsignrank() and rwilcox() now return integer (not double) >>> vectors. This halves the storage requirements for large >>> simulations. >>> >>> and what I've been suggesting is to revert this change >>> (svn rev r60225-6) which was purposefully and diligently done by >>> a fellow R core member, so indeed must be debatable. >>> >>> Martin >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> Martin Maechler >>>>> on Tue, 21 Jan 2020 09:25:19 +0100 writes:>>>>> Ben Bolker >>>>> on Mon, 20 Jan 2020 12:54:52 -0500 writes:>> Ugh, sounds like competing priorities. > indeed. >> * maintain type consistency >> * minimize storage (= current version, since 3.0.0) >> * maximize utility for large lambda (= proposed change) >> * keep user interface, and code, simple (e.g., it would be easy enough >> to add a switch that provided user control of int vs double return value) >> * backward compatibility > Last night, it came to my mind that we should do what we have > been doing in quite a few places in R, the last couple of years: > Return integer when possible, and switch to return double when > integers don't fit. > We've been doing so even for 1:N (well, now with additional ALTREP wrapper), > seq(), and even the fundamental length() function. > So I sat down and implemented it .. and it seemed to work > perfectly: Returning the same random numbers as now, but > switching to use double (instead of returning NAs) when the > values are too large. > I'll probably commit that to R-devel quite soonish. > Martin Committed in svn rev 77690; this is really very advantageous, as in some cases / applications or even just limit cases, you'd easily get into overflow sitations. The new R 4.0.0 behavior is IMO "the best of" being memory efficient (integer storage) in most cases (back compatible to R 3.x.x) and returning desired random numbers in large cases (compatible to R <= 2.x.x). Martin >> On 2020-01-20 12:33 p.m., Martin Maechler wrote: >>>>>>>> Benjamin Tyner >>>>>>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes: >>> >>> > On 1/20/20 4:26 AM, Martin Maechler wrote: >>> >> Coming late here -- after enjoying a proper weekend ;-) -- >>> >> I have been agreeing (with Spencer, IIUC) on this for a long >>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a >>> >> "design bug" that rpois() {and similar} must return return typeof() "integer". >>> >> >>> >> More strongly, I'm actually pretty convinced they should return >>> >> (integer-valued) double instead of NA_integer_ and for that >>> >> reason should always return double: >>> >> Even if we have (hopefully) a native 64bit integer in R, >>> >> 2^64 is still teeny tiny compared .Machine$double.max >>> >> >>> >> (and then maybe we'd have .Machine$longdouble.max which would >>> >> be considerably larger than double.max unless on Windows, where >>> >> the wise men at Microsoft decided to keep their workload simple >>> >> by defining "long double := double" - as 'long double' >>> >> unfortunately is not well defined by C standards) >>> >> >>> >> Martin >>> >> >>> > Martin if you are in favor, then certainly no objection from me! ;-) >>> >>> > So now what about other discrete distributions e.g. could a similar >>> > enhancement apply here? >>> >>> >>> >> rgeom(10L, 1e-10) >>> > ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA >>> > ?[7]???????? NA???????? NA???????? NA???????? NA >>> > Warning message: >>> > In rgeom(10L, 1e-10) : NAs produced >>> >>> yes, of course there are several such distributions. >>> >>> It's really something that should be discussed (possibly not >>> here, .. but then I've started it here ...). >>> >>> The NEWS for R 3.0.0 contain (in NEW FEATURES) : >>> >>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), >>> rsignrank() and rwilcox() now return integer (not double) >>> vectors. This halves the storage requirements for large >>> simulations. >>> >>> and what I've been suggesting is to revert this change >>> (svn rev r60225-6) which was purposefully and diligently done by >>> a fellow R core member, so indeed must be debatable. >>> >>> Martin >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel