Before the next release we really should to sort out the bias issue in
sample() reported by Ottoboni and Stark in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
filed aa a bug report by Duncan Murdoch at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.
Here are two examples of bad behavior through current R-devel:
set.seed(123)
m <- (2/5) * 2^32
x <- sample(m, 1000000, replace = TRUE)
table(x %% 2, x > m / 2)
##
## FALSE TRUE
## 0 300620 198792
## 1 200196 300392
table(sample(2/7 * 2^32, 1000000, replace = TRUE) %% 2)
##
## 0 1
## 429054 570946
I committed a modification to R_unif_index to address this by
generating random bits (blocks of 16) and rejection sampling, but for
now this is only enabled if the environment variable R_NEW_SAMPLE is
set before the first call.
Some things still needed:
- someone to look over the change and see if there are any issues
- adjustment of RNGkind to allowing the old behavior to be selected
- make the new behavior the default
- adjust documentation
- ???
Unfortunately I don't have enough free cycles to do this, but I can
help if someone else can take the lead.
There are two other places I found that might suffer from the same
issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
src/nmath/wilcox.c. Both can be addressed by using R_unif_index. I
have done that for walker_ProbSampleReplace, but the wilcox change
might need adjusting to support the standalone math library and I
don't feel confident enough I'd get that right.
Best,
luke
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Luke, I'm happy to help with this. Its great to see this get tackled (I've cc'ed Kelli Ottoboni who helped flag this issue). I can prepare a patch for the RNGkind related stuff and the doc update. As for ???, what are your (and others') thoughts about the possibility of a) a reproducibility API which takes either an R version (or maybe alternatively a date) and sets the RNGkind to the default for that version/date, and/or b) that sessionInfo be modified to capture (and display) the RNGkind in effect. Best, ~G On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke <luke-tierney at uiowa.edu> wrote:> > Before the next release we really should to sort out the bias issue in > sample() reported by Ottoboni and Stark in > https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and > filed aa a bug report by Duncan Murdoch at > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494. > > Here are two examples of bad behavior through current R-devel: > > set.seed(123) > m <- (2/5) * 2^32 > x <- sample(m, 1000000, replace = TRUE) > table(x %% 2, x > m / 2) > ## > ## FALSE TRUE > ## 0 300620 198792 > ## 1 200196 300392 > > table(sample(2/7 * 2^32, 1000000, replace = TRUE) %% 2) > ## > ## 0 1 > ## 429054 570946 > > I committed a modification to R_unif_index to address this by > generating random bits (blocks of 16) and rejection sampling, but for > now this is only enabled if the environment variable R_NEW_SAMPLE is > set before the first call. > > Some things still needed: > > - someone to look over the change and see if there are any issues > - adjustment of RNGkind to allowing the old behavior to be selected > - make the new behavior the default > - adjust documentation > - ??? > > Unfortunately I don't have enough free cycles to do this, but I can > help if someone else can take the lead. > > There are two other places I found that might suffer from the same > issue, in walker_ProbSampleReplace (pointed out bu O & S) and in > src/nmath/wilcox.c. Both can be addressed by using R_unif_index. I > have done that for walker_ProbSampleReplace, but the wilcox change > might need adjusting to support the standalone math library and I > don't feel confident enough I'd get that right. > > Best, > > luke > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tierney at uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Gabe
As mentioned on Twitter, I think the following behavior should be fixed
as part of the upcoming changes:
R.version.string
## [1] "R Under development (unstable) (2019-02-25 r76160)"
.Machine$double.digits
## [1] 53
set.seed(123)
RNGkind()
## [1] "Mersenne-Twister" "Inversion"???????
"Rejection"
length(table(runif(1e6)))
## [1] 999863
I don't expect any collisions when using Mersenne-Twister to generate a
million floating point values. I'm not sure what causes this behavior,
but it's documented in ?Random:
"Do not rely on randomness of low-order bits from RNGs. Most of the
supplied uniform generators return 32-bit integer values that are
converted to doubles, so they take at most 2^32 distinct values and long
runs will return duplicated values (Wichmann-Hill is the exception, and
all give at least 30 varying bits.)"
The "Wichman-Hill" bit is interesting:
RNGkind("Wichmann-Hill")
length(table(runif(1e6)))
## [1] 1000000
length(table(runif(1e6)))
## [1] 1000000
Mersenne-Twister has a much much larger periodicity than Wichmann-Hill,
it would be great to see the above behavior also for Mersenne-Twister.
Thanks for considering.
Best regards
Kirill
On 20.02.19 08:01, Gabriel Becker wrote:> Luke,
>
> I'm happy to help with this. Its great to see this get tackled
(I've cc'ed
> Kelli Ottoboni who helped flag this issue).
>
> I can prepare a patch for the RNGkind related stuff and the doc update.
>
> As for ???, what are your (and others') thoughts about the possibility
of
> a) a reproducibility API which takes either an R version (or maybe
> alternatively a date) and sets the RNGkind to the default for that
> version/date, and/or b) that sessionInfo be modified to capture (and
> display) the RNGkind in effect.
>
> Best,
> ~G
>
>
> On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke <luke-tierney at
uiowa.edu>
> wrote:
>
>> Before the next release we really should to sort out the bias issue in
>> sample() reported by Ottoboni and Stark in
>> https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
>> filed aa a bug report by Duncan Murdoch at
>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.
>>
>> Here are two examples of bad behavior through current R-devel:
>>
>> set.seed(123)
>> m <- (2/5) * 2^32
>> x <- sample(m, 1000000, replace = TRUE)
>> table(x %% 2, x > m / 2)
>> ##
>> ## FALSE TRUE
>> ## 0 300620 198792
>> ## 1 200196 300392
>>
>> table(sample(2/7 * 2^32, 1000000, replace = TRUE) %% 2)
>> ##
>> ## 0 1
>> ## 429054 570946
>>
>> I committed a modification to R_unif_index to address this by
>> generating random bits (blocks of 16) and rejection sampling, but for
>> now this is only enabled if the environment variable R_NEW_SAMPLE is
>> set before the first call.
>>
>> Some things still needed:
>>
>> - someone to look over the change and see if there are any issues
>> - adjustment of RNGkind to allowing the old behavior to be selected
>> - make the new behavior the default
>> - adjust documentation
>> - ???
>>
>> Unfortunately I don't have enough free cycles to do this, but I can
>> help if someone else can take the lead.
>>
>> There are two other places I found that might suffer from the same
>> issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
>> src/nmath/wilcox.c. Both can be addressed by using R_unif_index. I
>> have done that for walker_ProbSampleReplace, but the wilcox change
>> might need adjusting to support the standalone math library and I
>> don't feel confident enough I'd get that right.
>>
>> Best,
>>
>> luke
>>
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa Phone: 319-335-3386
>> Department of Statistics and Fax: 319-335-3017
>> Actuarial Science
>> 241 Schaeffer Hall email: luke-tierney at uiowa.edu
>> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel