>>>>> Martin Maechler >>>>> on Tue, 21 Jan 2020 09:25:19 +0100 writes:>>>>> Ben Bolker >>>>> on Mon, 20 Jan 2020 12:54:52 -0500 writes:>> Ugh, sounds like competing priorities. > indeed. >> * maintain type consistency >> * minimize storage (= current version, since 3.0.0) >> * maximize utility for large lambda (= proposed change) >> * keep user interface, and code, simple (e.g., it would be easy enough >> to add a switch that provided user control of int vs double return value) >> * backward compatibility > Last night, it came to my mind that we should do what we have > been doing in quite a few places in R, the last couple of years: > Return integer when possible, and switch to return double when > integers don't fit. > We've been doing so even for 1:N (well, now with additional ALTREP wrapper), > seq(), and even the fundamental length() function. > So I sat down and implemented it .. and it seemed to work > perfectly: Returning the same random numbers as now, but > switching to use double (instead of returning NAs) when the > values are too large. > I'll probably commit that to R-devel quite soonish. > Martin Committed in svn rev 77690; this is really very advantageous, as in some cases / applications or even just limit cases, you'd easily get into overflow sitations. The new R 4.0.0 behavior is IMO "the best of" being memory efficient (integer storage) in most cases (back compatible to R 3.x.x) and returning desired random numbers in large cases (compatible to R <= 2.x.x). Martin >> On 2020-01-20 12:33 p.m., Martin Maechler wrote: >>>>>>>> Benjamin Tyner >>>>>>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes: >>> >>> > On 1/20/20 4:26 AM, Martin Maechler wrote: >>> >> Coming late here -- after enjoying a proper weekend ;-) -- >>> >> I have been agreeing (with Spencer, IIUC) on this for a long >>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a >>> >> "design bug" that rpois() {and similar} must return return typeof() "integer". >>> >> >>> >> More strongly, I'm actually pretty convinced they should return >>> >> (integer-valued) double instead of NA_integer_ and for that >>> >> reason should always return double: >>> >> Even if we have (hopefully) a native 64bit integer in R, >>> >> 2^64 is still teeny tiny compared .Machine$double.max >>> >> >>> >> (and then maybe we'd have .Machine$longdouble.max which would >>> >> be considerably larger than double.max unless on Windows, where >>> >> the wise men at Microsoft decided to keep their workload simple >>> >> by defining "long double := double" - as 'long double' >>> >> unfortunately is not well defined by C standards) >>> >> >>> >> Martin >>> >> >>> > Martin if you are in favor, then certainly no objection from me! ;-) >>> >>> > So now what about other discrete distributions e.g. could a similar >>> > enhancement apply here? >>> >>> >>> >> rgeom(10L, 1e-10) >>> > ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA >>> > ?[7]???????? NA???????? NA???????? NA???????? NA >>> > Warning message: >>> > In rgeom(10L, 1e-10) : NAs produced >>> >>> yes, of course there are several such distributions. >>> >>> It's really something that should be discussed (possibly not >>> here, .. but then I've started it here ...). >>> >>> The NEWS for R 3.0.0 contain (in NEW FEATURES) : >>> >>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), >>> rsignrank() and rwilcox() now return integer (not double) >>> vectors. This halves the storage requirements for large >>> simulations. >>> >>> and what I've been suggesting is to revert this change >>> (svn rev r60225-6) which was purposefully and diligently done by >>> a fellow R core member, so indeed must be debatable. >>> >>> Martin >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
On 2020-01-22 02:54, Martin Maechler wrote:>>>>>> Martin Maechler >>>>>> on Tue, 21 Jan 2020 09:25:19 +0100 writes: >>>>>> Ben Bolker >>>>>> on Mon, 20 Jan 2020 12:54:52 -0500 writes: > >> Ugh, sounds like competing priorities. > > > indeed. > > >> * maintain type consistency > >> * minimize storage (= current version, since 3.0.0) > >> * maximize utility for large lambda (= proposed change) > >> * keep user interface, and code, simple (e.g., it would be easy enough > >> to add a switch that provided user control of int vs double return value) > >> * backward compatibility > > > Last night, it came to my mind that we should do what we have > > been doing in quite a few places in R, the last couple of years: > > > Return integer when possible, and switch to return double when > > integers don't fit. > > > We've been doing so even for 1:N (well, now with additional ALTREP wrapper), > > seq(), and even the fundamental length() function. > > > So I sat down and implemented it .. and it seemed to work > > perfectly: Returning the same random numbers as now, but > > switching to use double (instead of returning NAs) when the > > values are too large. > > > I'll probably commit that to R-devel quite soonish. > > Martin > > Committed in svn rev 77690; this is really very advantageous, as > in some cases / applications or even just limit cases, you'd > easily get into overflow sitations. > > The new R 4.0.0 behavior is IMO "the best of" being memory > efficient (integer storage) in most cases (back compatible to R 3.x.x) and > returning desired random numbers in large cases (compatible to R <= 2.x.x). > > MartinWunderbar!? Sehr gut gemacht!? ("Wonderful!? Very well done!") Thanks, Spencer> > >> On 2020-01-20 12:33 p.m., Martin Maechler wrote: > >>>>>>>> Benjamin Tyner > >>>>>>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes: > >>> > >>> > On 1/20/20 4:26 AM, Martin Maechler wrote: > >>> >> Coming late here -- after enjoying a proper weekend ;-) -- > >>> >> I have been agreeing (with Spencer, IIUC) on this for a long > >>> >> time (~ 3 yrs, or more?), namely that I've come to see it as a > >>> >> "design bug" that rpois() {and similar} must return return typeof() "integer". > >>> >> > >>> >> More strongly, I'm actually pretty convinced they should return > >>> >> (integer-valued) double instead of NA_integer_ and for that > >>> >> reason should always return double: > >>> >> Even if we have (hopefully) a native 64bit integer in R, > >>> >> 2^64 is still teeny tiny compared .Machine$double.max > >>> >> > >>> >> (and then maybe we'd have .Machine$longdouble.max which would > >>> >> be considerably larger than double.max unless on Windows, where > >>> >> the wise men at Microsoft decided to keep their workload simple > >>> >> by defining "long double := double" - as 'long double' > >>> >> unfortunately is not well defined by C standards) > >>> >> > >>> >> Martin > >>> >> > >>> > Martin if you are in favor, then certainly no objection from me! ;-) > >>> > >>> > So now what about other discrete distributions e.g. could a similar > >>> > enhancement apply here? > >>> > >>> > >>> >> rgeom(10L, 1e-10) > >>> > ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA > >>> > ?[7]???????? NA???????? NA???????? NA???????? NA > >>> > Warning message: > >>> > In rgeom(10L, 1e-10) : NAs produced > >>> > >>> yes, of course there are several such distributions. > >>> > >>> It's really something that should be discussed (possibly not > >>> here, .. but then I've started it here ...). > >>> > >>> The NEWS for R 3.0.0 contain (in NEW FEATURES) : > >>> > >>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), > >>> rsignrank() and rwilcox() now return integer (not double) > >>> vectors. This halves the storage requirements for large > >>> simulations. > >>> > >>> and what I've been suggesting is to revert this change > >>> (svn rev r60225-6) which was purposefully and diligently done by > >>> a fellow R core member, so indeed must be debatable. > >>> > >>> Martin > >>> > >>> ______________________________________________ > >>> R-devel at r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > > >> ______________________________________________ > >> R-devel at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Fantastic!! Thanks, Avi On Wed, Jan 22, 2020 at 11:14 AM Spencer Graves <spencer.graves at prodsyse.com> wrote:> > > On 2020-01-22 02:54, Martin Maechler wrote: > >>>>>> Martin Maechler > >>>>>> on Tue, 21 Jan 2020 09:25:19 +0100 writes: > >>>>>> Ben Bolker > >>>>>> on Mon, 20 Jan 2020 12:54:52 -0500 writes: > > >> Ugh, sounds like competing priorities. > > > > > indeed. > > > > >> * maintain type consistency > > >> * minimize storage (= current version, since 3.0.0) > > >> * maximize utility for large lambda (= proposed change) > > >> * keep user interface, and code, simple (e.g., it would be easy > enough > > >> to add a switch that provided user control of int vs double > return value) > > >> * backward compatibility > > > > > Last night, it came to my mind that we should do what we have > > > been doing in quite a few places in R, the last couple of years: > > > > > Return integer when possible, and switch to return double when > > > integers don't fit. > > > > > We've been doing so even for 1:N (well, now with additional > ALTREP wrapper), > > > seq(), and even the fundamental length() function. > > > > > So I sat down and implemented it .. and it seemed to work > > > perfectly: Returning the same random numbers as now, but > > > switching to use double (instead of returning NAs) when the > > > values are too large. > > > > > I'll probably commit that to R-devel quite soonish. > > > Martin > > > > Committed in svn rev 77690; this is really very advantageous, as > > in some cases / applications or even just limit cases, you'd > > easily get into overflow sitations. > > > > The new R 4.0.0 behavior is IMO "the best of" being memory > > efficient (integer storage) in most cases (back compatible to R 3.x.x) > and > > returning desired random numbers in large cases (compatible to R <> 2.x.x). > > > > Martin > > > Wunderbar! Sehr gut gemacht! ("Wonderful! Very well done!") Thanks, > Spencer > > > > >> On 2020-01-20 12:33 p.m., Martin Maechler wrote: > > >>>>>>>> Benjamin Tyner > > >>>>>>>> on Mon, 20 Jan 2020 08:10:49 -0500 writes: > > >>> > > >>> > On 1/20/20 4:26 AM, Martin Maechler wrote: > > >>> >> Coming late here -- after enjoying a proper weekend ;-) -- > > >>> >> I have been agreeing (with Spencer, IIUC) on this for a long > > >>> >> time (~ 3 yrs, or more?), namely that I've come to see it as > a > > >>> >> "design bug" that rpois() {and similar} must return return > typeof() "integer". > > >>> >> > > >>> >> More strongly, I'm actually pretty convinced they should > return > > >>> >> (integer-valued) double instead of NA_integer_ and for that > > >>> >> reason should always return double: > > >>> >> Even if we have (hopefully) a native 64bit integer in R, > > >>> >> 2^64 is still teeny tiny compared .Machine$double.max > > >>> >> > > >>> >> (and then maybe we'd have .Machine$longdouble.max which > would > > >>> >> be considerably larger than double.max unless on Windows, > where > > >>> >> the wise men at Microsoft decided to keep their workload > simple > > >>> >> by defining "long double := double" - as 'long double' > > >>> >> unfortunately is not well defined by C standards) > > >>> >> > > >>> >> Martin > > >>> >> > > >>> > Martin if you are in favor, then certainly no objection from > me! ;-) > > >>> > > >>> > So now what about other discrete distributions e.g. could a > similar > > >>> > enhancement apply here? > > >>> > > >>> > > >>> >> rgeom(10L, 1e-10) > > >>> > [1] NA 1503061294 NA NA > 1122447583 NA > > >>> > [7] NA NA NA NA > > >>> > Warning message: > > >>> > In rgeom(10L, 1e-10) : NAs produced > > >>> > > >>> yes, of course there are several such distributions. > > >>> > > >>> It's really something that should be discussed (possibly not > > >>> here, .. but then I've started it here ...). > > >>> > > >>> The NEWS for R 3.0.0 contain (in NEW FEATURES) : > > >>> > > >>> * Functions rbinom(), rgeom(), rhyper(), rpois(), rnbinom(), > > >>> rsignrank() and rwilcox() now return integer (not double) > > >>> vectors. This halves the storage requirements for large > > >>> simulations. > > >>> > > >>> and what I've been suggesting is to revert this change > > >>> (svn rev r60225-6) which was purposefully and diligently done by > > >>> a fellow R core member, so indeed must be debatable. > > >>> > > >>> Martin > > >>> > > >>> ______________________________________________ > > >>> R-devel at r-project.org mailing list > > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > > >>> > > > > >> ______________________________________________ > > >> R-devel at r-project.org mailing list > > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > ______________________________________________ > > > R-devel at r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Sent from Gmail Mobile [[alternative HTML version deleted]]