Thanks to Luke and Avi for their comments.? I wrapped "round" around the call to "rnorm" inside my "rpois.".? For "lambda" really big, that "round" won't do anything.? However, it appears to give integers in floating point representation that are larger than .Machine$integer.max.? That sounds very much like what someone would want.? Spencer On 2020-01-19 21:00, Tierney, Luke wrote:> R uses the C 'int' type for its integer data and that is pretty much > universally 32 bit these days. In fact R wont' compile if it is not. > That means the range for integer data is the integers in [-2^31, > +2^31). > > It would be good to allow for a larger integer range for R integer > objects, and several of us are thinking about how me might get there. > But it isn't easy to get right, so it may take some time. I doubt > anything can happen for R 4.0.0 this year, but 2021 may be possible. > > I few notes inline below: > > On Sun, 19 Jan 2020, Spencer Graves wrote: > >> On my Mac: >> >> >> str(.Machine) >> ... >> $ integer.max????????? : int 2147483647 >> ?$ sizeof.long????????? : int 8 >> ?$ sizeof.longlong????? : int 8 >> ?$ sizeof.longdouble??? : int 16 >> ?$ sizeof.pointer?????? : int 8 >> >> >> ????? On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise >> the same as on my Mac. > One of many annoyances of Windows -- done for compatibility with > ancient Window apps. > >> ????? Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits? >> log2(.Machine$integer.max) = 31.? Then 8 bytes is what used to be called >> double precision (2 words of 4 bytes each)?? And $ sizeof.longdouble >> 16 = 4 words of 4 bytes each? > double precision is a floating point concept, not related to integers. > > If you want to figure out whether you are running a 32 bit or 64 bit R > look at sizeof.pointer -- 4 means 32 bits, 8 64 bits. > > Best, > > luke > > >> >> ????? Spencer >> >> >> On 2020-01-19 15:41, Avraham Adler wrote: >>> Floor (maybe round) of non-negative numerics, though. Poisson should >>> never have anything after decimal. >>> >>> Still think it?s worth allowing long long for R64 bit, just for purity >>> sake. >>> >>> Avi >>> >>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves >>> <spencer.graves at prodsyse.com <mailto:spencer.graves at prodsyse.com>> wrote: >>> >>> >>> >>> On 2020-01-19 13:01, Avraham Adler wrote: >>>> Crazy thought, but being that a sum of Poissons is Poisson in the >>>> sum, can you break your ?big? simulation into the sum of a few >>>> smaller ones? Or is the order of magnitude difference just too great? >>> >>> ????? I don't perceive that as feasible.? Once I found what was >>> generating NAs, it was easy to code a function to return >>> pseudo-random numbers using the standard normal approximation to >>> the Poisson for those extreme cases.? [For a Poisson with mean >>> 1e6, for example, the skewness (third standardized moment) is >>> 0.001.? At least for my purposes, that should be adequate.][1] >>> >>> >>> ????? What are the negative consequences of having rpois return >>> numerics that are always nonnegative? >>> >>> >>> ????? Spencer >>> >>> >>> [1]? In the code I reported before, I just changed the threshold >>> of 1e6 to 0.5*.Machine$integer.max.? On my Mac, >>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means >>> that a Poisson distributed pseudo-random number just under that >>> would have to be over 23000 standard deviations above the mean to >>> exceed .Machine$integer.max. >>> >>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves >>>> <spencer.graves at prodsyse.com >>>> <mailto:spencer.graves at prodsyse.com>> wrote: >>>> >>>> ????? This issue arose for me in simulations to estimate >>>> confidence, prediction, and tolerance intervals from glm(., >>>> family=poisson) fits embedded in a BMA::bic.glm fit using a >>>> simulate.bic.glm function I added to the development version >>>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun" >>>> <https://github.com/sbgraves237/Ecfun>. This is part of a >>>> vignette I'm developing, available at >>>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd" >>>> <https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>. >>>> This includes a simulated mean of a mixture of Poissons that >>>> exceeds 2e22.? It doesn't seem unreasonable to me to have >>>> rpois output a numerics rather than integers when a number >>>> simulated exceeds .Machine$integer.max.? And it does seem to >>>> make less sense in such cases to return NAs. >>>> >>>> >>>> ?????? Alternatively, might it make sense to add another >>>> argument to rpois to give the user the choice?? E.g., an >>>> argument "bigOutput" with (I hope) default = "numeric" and >>>> "NA" as a second option.? Or NA is the default, so no code >>>> that relied that feature of the current code would be broken >>>> by the change.? If someone wanted to use arbitrary precision >>>> arithmetic, they could write their own version of this >>>> function with "arbitraryPrecision" as an optional value for >>>> the "bigOutput" argument. >>>> >>>> >>>> ????? Comments? >>>> ????? Thanks, >>>> ????? Spencer Graves >>>> >>>> >>>> >>>> On 2020-01-19 10:28, Avraham Adler wrote: >>>>> Technically, lambda can always be numeric. It is the >>>>> observations which must be integral. >>>>> >>>>> Would hitting everything larger than maxint or maxlonglong >>>>> with floor or round fundamentally change the distribution? >>>>> Well, yes, but enough that it would matter over process risk? >>>>> >>>>> Avi >>>>> >>>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner >>>>> <btyner at gmail.com <mailto:btyner at gmail.com>> wrote: >>>>> >>>>> So imagine rpois is changed, such that the storage mode >>>>> of its return >>>>> value is sometimes integer and sometimes numeric. Then >>>>> imagine the case >>>>> where lambda is itself a realization of a random >>>>> variable. Do we really >>>>> want the storage mode to inherit that randomness? >>>>> >>>>> >>>>> On 1/19/20 10:47 AM, Avraham Adler wrote: >>>>> > Maybe there should be code for 64 bit R to use long >>>>> long or the like? >>>>> > >>>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves >>>>> > <spencer.graves at prodsyse.com >>>>> <mailto:spencer.graves at prodsyse.com> >>>>> <mailto:spencer.graves at prodsyse.com >>>>> <mailto:spencer.graves at prodsyse.com>>> wrote: >>>>> > >>>>> > >>>>> > >>>>> >? ? ?On 2020-01-19 09:34, Benjamin Tyner wrote: >>>>> >? ? ?>> >>>>> > >>>>> ?------------------------------------------------------------------------ >>>>> >? ? ?>> Hello, All: >>>>> >? ? ?>> >>>>> >? ? ?>> >>>>> >? ? ?>> ? ????? Consider: >>>>> >? ? ?>> >>>>> >? ? ?>> >>>>> >? ? ?>> Browse[2]> set.seed(1) >>>>> >? ? ?>> Browse[2]> rpois(9, 1e10) >>>>> >? ? ?>> NAs produced[1] NA NA NA NA NA NA NA NA NA >>>>> >? ? ?>> >>>>> >? ? ?>> >>>>> >? ? ?>> ? ????? Should this happen? >>>>> >? ? ?>> >>>>> >? ? ?>> >>>>> >? ? ?>> ? ????? I think that for, say, lambda>1e6, >>>>> rpois should return >>>>> >? ? ?rnorm(., >>>>> >? ? ?>> lambda, sqrt(lambda)). >>>>> >? ? ?> But need to implement carefully; rpois should >>>>> always return a >>>>> >? ? ?> non-negative integer, whereas rnorm always >>>>> returns numeric... >>>>> >? ? ?> >>>>> > >>>>> >? ? ??????? Thanks for the reply. >>>>> > >>>>> > >>>>> >? ? ??????? However, I think it's not acceptable to get >>>>> an NA from a >>>>> >? ? ?number >>>>> >? ? ?that cannot be expressed as an integer.? Whenever >>>>> a randomly >>>>> >? ? ?generated >>>>> >? ? ?number would exceed .Machine$integer.max, the >>>>> choice is between >>>>> >? ? ?returning NA or a non-integer numeric.? Consider: >>>>> > >>>>> > >>>>> >? ? ??> 2*.Machine$integer.max >>>>> >? ? ?[1] 4294967294 >>>>> >? ? ??> as.integer(2*.Machine$integer.max) >>>>> >? ? ?[1] NA >>>>> >? ? ?Warning message: >>>>> >? ? ?NAs introduced by coercion to integer range >>>>> > >>>>> > >>>>> >? ? ??????? I'd rather have the non-integer numeric. >>>>> > >>>>> > >>>>> >? ? ??????? Spencer >>>>> > >>>>> > ?______________________________________________ >>>>> > R-devel at r-project.org <mailto:R-devel at r-project.org> >>>>> <mailto:R-devel at r-project.org >>>>> <mailto:R-devel at r-project.org>> mailing list >>>>> > https://stat.ethz.ch/mailman/listinfo/r-devel >>>>> > >>>>> > -- >>>>> > Sent from Gmail Mobile >>>>> >>>>> -- >>>>> Sent from Gmail Mobile >>>> -- >>>> Sent from Gmail Mobile >>> -- >>> Sent from Gmail Mobile >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >>
>>>>> Spencer Graves >>>>> on Sun, 19 Jan 2020 21:35:04 -0600 writes:> Thanks to Luke and Avi for their comments.? I wrapped "round" around the > call to "rnorm" inside my "rpois.".? For "lambda" really big, that > "round" won't do anything.? However, it appears to give integers in > floating point representation that are larger than > .Machine$integer.max.? That sounds very much like what someone would > want.? Spencer Coming late here -- after enjoying a proper weekend ;-) -- I have been agreeing (with Spencer, IIUC) on this for a long time (~ 3 yrs, or more?), namely that I've come to see it as a "design bug" that rpois() {and similar} must return return typeof() "integer". More strongly, I'm actually pretty convinced they should return (integer-valued) double instead of NA_integer_ and for that reason should always return double: Even if we have (hopefully) a native 64bit integer in R, 2^64 is still teeny tiny compared .Machine$double.max (and then maybe we'd have .Machine$longdouble.max which would be considerably larger than double.max unless on Windows, where the wise men at Microsoft decided to keep their workload simple by defining "long double := double" - as 'long double' unfortunately is not well defined by C standards) Martin > On 2020-01-19 21:00, Tierney, Luke wrote: >> R uses the C 'int' type for its integer data and that is pretty much >> universally 32 bit these days. In fact R wont' compile if it is not. >> That means the range for integer data is the integers in [-2^31, >> +2^31). >> >> It would be good to allow for a larger integer range for R integer >> objects, and several of us are thinking about how me might get there. >> But it isn't easy to get right, so it may take some time. I doubt >> anything can happen for R 4.0.0 this year, but 2021 may be possible. >> >> I few notes inline below: >> >> On Sun, 19 Jan 2020, Spencer Graves wrote: >> >>> On my Mac: >>> >>> >>> str(.Machine) >>> ... >>> $ integer.max????????? : int 2147483647 >>> ?$ sizeof.long????????? : int 8 >>> ?$ sizeof.longlong????? : int 8 >>> ?$ sizeof.longdouble??? : int 16 >>> ?$ sizeof.pointer?????? : int 8 >>> >>> >>> ????? On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise >>> the same as on my Mac. >> One of many annoyances of Windows -- done for compatibility with >> ancient Window apps. >> >>> ????? Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits? >>> log2(.Machine$integer.max) = 31.? Then 8 bytes is what used to be called >>> double precision (2 words of 4 bytes each)?? And $ sizeof.longdouble >>> 16 = 4 words of 4 bytes each? >> double precision is a floating point concept, not related to integers. >> >> If you want to figure out whether you are running a 32 bit or 64 bit R >> look at sizeof.pointer -- 4 means 32 bits, 8 64 bits. >> >> Best, >> >> luke >> >> >>> >>> ????? Spencer >>> >>> >>> On 2020-01-19 15:41, Avraham Adler wrote: >>>> Floor (maybe round) of non-negative numerics, though. Poisson should >>>> never have anything after decimal. >>>> >>>> Still think it?s worth allowing long long for R64 bit, just for purity >>>> sake. >>>> >>>> Avi >>>> >>>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves >>>> <spencer.graves at prodsyse.com <mailto:spencer.graves at prodsyse.com>> wrote: >>>> >>>> >>>> >>>> On 2020-01-19 13:01, Avraham Adler wrote: >>>>> Crazy thought, but being that a sum of Poissons is Poisson in the >>>>> sum, can you break your ?big? simulation into the sum of a few >>>>> smaller ones? Or is the order of magnitude difference just too great? >>>> >>>> ????? I don't perceive that as feasible.? Once I found what was >>>> generating NAs, it was easy to code a function to return >>>> pseudo-random numbers using the standard normal approximation to >>>> the Poisson for those extreme cases.? [For a Poisson with mean >>>> 1e6, for example, the skewness (third standardized moment) is >>>> 0.001.? At least for my purposes, that should be adequate.][1] >>>> >>>> >>>> ????? What are the negative consequences of having rpois return >>>> numerics that are always nonnegative? >>>> >>>> >>>> ????? Spencer >>>> >>>> >>>> [1]? In the code I reported before, I just changed the threshold >>>> of 1e6 to 0.5*.Machine$integer.max.? On my Mac, >>>> .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means >>>> that a Poisson distributed pseudo-random number just under that >>>> would have to be over 23000 standard deviations above the mean to >>>> exceed .Machine$integer.max. >>>> >>>>> On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves >>>>> <spencer.graves at prodsyse.com >>>>> <mailto:spencer.graves at prodsyse.com>> wrote: >>>>> >>>>> ????? This issue arose for me in simulations to estimate >>>>> confidence, prediction, and tolerance intervals from glm(., >>>>> family=poisson) fits embedded in a BMA::bic.glm fit using a >>>>> simulate.bic.glm function I added to the development version >>>>> of Ecfun, available at "https://github.com/sbgraves237/Ecfun" >>>>> <https://github.com/sbgraves237/Ecfun>. This is part of a >>>>> vignette I'm developing, available at >>>>> "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd" >>>>> <https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>. >>>>> This includes a simulated mean of a mixture of Poissons that >>>>> exceeds 2e22.? It doesn't seem unreasonable to me to have >>>>> rpois output a numerics rather than integers when a number >>>>> simulated exceeds .Machine$integer.max.? And it does seem to >>>>> make less sense in such cases to return NAs. >>>>> >>>>> >>>>> ?????? Alternatively, might it make sense to add another >>>>> argument to rpois to give the user the choice?? E.g., an >>>>> argument "bigOutput" with (I hope) default = "numeric" and >>>>> "NA" as a second option.? Or NA is the default, so no code >>>>> that relied that feature of the current code would be broken >>>>> by the change.? If someone wanted to use arbitrary precision >>>>> arithmetic, they could write their own version of this >>>>> function with "arbitraryPrecision" as an optional value for >>>>> the "bigOutput" argument. >>>>> >>>>> >>>>> ????? Comments? >>>>> ????? Thanks, >>>>> ????? Spencer Graves >>>>> >>>>> >>>>> >>>>> On 2020-01-19 10:28, Avraham Adler wrote:>>>>> Technically, lambda can always be numeric. It is the >>>>> observations which must be integral.>>>>>>>>>>> Would hitting everything larger than maxint or maxlonglong >>>>> with floor or round fundamentally change the distribution? >>>>> Well, yes, but enough that it would matter over process risk?>>>>>>>>>>> Avi>>>>>>>>>>> On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner >>>>> <btyner at gmail.com <mailto:btyner at gmail.com>> wrote:>>>>>>>>>>> So imagine rpois is changed, such that the storage mode >>>>> of its return >>>>> value is sometimes integer and sometimes numeric. Then >>>>> imagine the case >>>>> where lambda is itself a realization of a random >>>>> variable. Do we really >>>>> want the storage mode to inherit that randomness?>>>>>> >>>>>>>>>>> On 1/19/20 10:47 AM, Avraham Adler wrote: >>>>> > Maybe there should be code for 64 bit R to use long >>>>> long or the like? >>>>> > >>>>> > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves >>>>> > <spencer.graves at prodsyse.com >>>>> <mailto:spencer.graves at prodsyse.com> >>>>> <mailto:spencer.graves at prodsyse.com >>>>> <mailto:spencer.graves at prodsyse.com>>> wrote: >>>>> > >>>>> > >>>>> > >>>>> >? ? ?On 2020-01-19 09:34, Benjamin Tyner wrote: >>>>> >? ? ?>> >>>>> > >>>>> ?------------------------------------------------------------------------ >>>>> >? ? ?>> Hello, All: >>>>> >? ? ?>> >>>>> >? ? ?>> >>>>> >? ? ?>> ? ????? Consider: >>>>> >? ? ?>> >>>>> >? ? ?>> >>>>> >? ? ?>> Browse[2]> set.seed(1) >>>>> >? ? ?>> Browse[2]> rpois(9, 1e10) >>>>> >? ? ?>> NAs produced[1] NA NA NA NA NA NA NA NA NA >>>>> >? ? ?>> >>>>> >? ? ?>> >>>>> >? ? ?>> ? ????? Should this happen? >>>>> >? ? ?>> >>>>> >? ? ?>> >>>>> >? ? ?>> ? ????? I think that for, say, lambda>1e6, >>>>> rpois should return >>>>> >? ? ?rnorm(., >>>>> >? ? ?>> lambda, sqrt(lambda)). >>>>> >? ? ?> But need to implement carefully; rpois should >>>>> always return a >>>>> >? ? ?> non-negative integer, whereas rnorm always >>>>> returns numeric... >>>>> >? ? ?> >>>>> > >>>>> >? ? ??????? Thanks for the reply. >>>>> > >>>>> > >>>>> >? ? ??????? However, I think it's not acceptable to get >>>>> an NA from a >>>>> >? ? ?number >>>>> >? ? ?that cannot be expressed as an integer.? Whenever >>>>> a randomly >>>>> >? ? ?generated >>>>> >? ? ?number would exceed .Machine$integer.max, the >>>>> choice is between >>>>> >? ? ?returning NA or a non-integer numeric.? Consider: >>>>> > >>>>> > >>>>> >? ? ??> 2*.Machine$integer.max >>>>> >? ? ?[1] 4294967294 >>>>> >? ? ??> as.integer(2*.Machine$integer.max) >>>>> >? ? ?[1] NA >>>>> >? ? ?Warning message: >>>>> >? ? ?NAs introduced by coercion to integer range >>>>> > >>>>> > >>>>> >? ? ??????? I'd rather have the non-integer numeric. >>>>> > >>>>> > >>>>> >? ? ??????? Spencer >>>>> > >>>>> > ?______________________________________________ >>>>> > R-devel at r-project.org <mailto:R-devel at r-project.org> >>>>> <mailto:R-devel at r-project.org >>>>> <mailto:R-devel at r-project.org>> mailing list >>>>> > https://stat.ethz.ch/mailman/listinfo/r-devel >>>>> > >>>>> > -- >>>>> > Sent from Gmail Mobile>>>>>>>>>>> -- >>>>> Sent from Gmail Mobile>>>>> -- >>>>> Sent from Gmail Mobile >>>> -- >>>> Sent from Gmail Mobile >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
On 1/20/20 4:26 AM, Martin Maechler wrote:> Coming late here -- after enjoying a proper weekend ;-) -- > I have been agreeing (with Spencer, IIUC) on this for a long > time (~ 3 yrs, or more?), namely that I've come to see it as a > "design bug" that rpois() {and similar} must return return typeof() "integer". > > More strongly, I'm actually pretty convinced they should return > (integer-valued) double instead of NA_integer_ and for that > reason should always return double: > Even if we have (hopefully) a native 64bit integer in R, > 2^64 is still teeny tiny compared .Machine$double.max > > (and then maybe we'd have .Machine$longdouble.max which would > be considerably larger than double.max unless on Windows, where > the wise men at Microsoft decided to keep their workload simple > by defining "long double := double" - as 'long double' > unfortunately is not well defined by C standards) > > Martin >Martin if you are in favor, then certainly no objection from me! ;-) So now what about other discrete distributions e.g. could a similar enhancement apply here? > rgeom(10L, 1e-10) ?[1]???????? NA 1503061294???????? NA???????? NA 1122447583???????? NA ?[7]???????? NA???????? NA???????? NA???????? NA Warning message: In rgeom(10L, 1e-10) : NAs produced