thr3ads.net - R devel - [Rd] Bias in R's random integers? [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Carl Boettiger

2018-Sep-18 21:46 UTC

[Rd] Bias in R's random integers?

Dear list,

It looks to me that R samples random integers using an intuitive but biased
algorithm by going from a random number on [0,1) from the PRNG to a random
integer, e.g.
https://github.com/wch/r-source/blob/tags/R-3-5-1/src/main/RNG.c#L808

Many other languages use various rejection sampling approaches which
provide an unbiased method for sampling, such as in Go, python, and others
described here:  https://arxiv.org/abs/1805.10941 (I believe the biased
algorithm currently used in R is also described there).  I'm not an expert
in this area, but does it make sense for the R to adopt one of the unbiased
random sample algorithms outlined there and used in other languages?  Would
a patch providing such an algorithm be welcome? What concerns would need to
be addressed first?

I believe this issue was also raised by Killie & Philip in
http://r.789695.n4.nabble.com/Bug-in-sample-td4729483.html, and more
recently in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf,
pointing to the python implementation for comparison:
https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py#L265

Thanks!

Carl
-- 

http://carlboettiger.info

	[[alternative HTML version deleted]]

Duncan Murdoch

2018-Sep-19 12:43 UTC

head link

[Rd] Bias in R's random integers?

On 18/09/2018 5:46 PM, Carl Boettiger wrote:> Dear list,
> 
> It looks to me that R samples random integers using an intuitive but biased
> algorithm by going from a random number on [0,1) from the PRNG to a random
> integer, e.g.
> https://github.com/wch/r-source/blob/tags/R-3-5-1/src/main/RNG.c#L808
> 
> Many other languages use various rejection sampling approaches which
> provide an unbiased method for sampling, such as in Go, python, and others
> described here:  https://arxiv.org/abs/1805.10941 (I believe the biased
> algorithm currently used in R is also described there).  I'm not an
expert
> in this area, but does it make sense for the R to adopt one of the unbiased
> random sample algorithms outlined there and used in other languages?  Would
> a patch providing such an algorithm be welcome? What concerns would need to
> be addressed first?
> 
> I believe this issue was also raised by Killie & Philip in
> http://r.789695.n4.nabble.com/Bug-in-sample-td4729483.html, and more
> recently in
> https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf,
> pointing to the python implementation for comparison:
>
https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py#L265
I think the analyses are correct, but I doubt if a change to the default 
is likely to be accepted as it would make it more difficult to reproduce 
older results.

On the other hand, a contribution of a new function like sample() but 
not suffering from the bias would be good.  The normal way to make such 
a contribution is in a user contributed package.

By the way, R code illustrating the bias is probably not very hard to 
put together.  I believe the bias manifests itself in sample() producing 
values with two different probabilities (instead of all equal 
probabilities).  Those may differ by as much as one part in 2^32.  It's 
very difficult to detect a probability difference that small, but if you 
define the partition of values into the high probability values vs the 
low probability values, you can probably detect the difference in a 
feasible simulation.

Duncan Murdoch

Iñaki Ucar

2018-Sep-19 13:09 UTC

head link

[Rd] Bias in R's random integers?

El mi?., 19 sept. 2018 a las 14:43, Duncan Murdoch
(<murdoch.duncan at gmail.com>) escribi?:>
> On 18/09/2018 5:46 PM, Carl Boettiger wrote:
> > Dear list,
> >
> > It looks to me that R samples random integers using an intuitive but
biased
> > algorithm by going from a random number on [0,1) from the PRNG to a
random
> > integer, e.g.
> > https://github.com/wch/r-source/blob/tags/R-3-5-1/src/main/RNG.c#L808
> >
> > Many other languages use various rejection sampling approaches which
> > provide an unbiased method for sampling, such as in Go, python, and
others
> > described here:  https://arxiv.org/abs/1805.10941 (I believe the
biased
> > algorithm currently used in R is also described there).  I'm not
an expert
> > in this area, but does it make sense for the R to adopt one of the
unbiased
> > random sample algorithms outlined there and used in other languages? 
Would
> > a patch providing such an algorithm be welcome? What concerns would
need to
> > be addressed first?
> >
> > I believe this issue was also raised by Killie & Philip in
> > http://r.789695.n4.nabble.com/Bug-in-sample-td4729483.html, and more
> > recently in
> > https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf,
> > pointing to the python implementation for comparison:
> >
https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py#L265
>
> I think the analyses are correct, but I doubt if a change to the default
> is likely to be accepted as it would make it more difficult to reproduce
> older results.
>
> On the other hand, a contribution of a new function like sample() but
> not suffering from the bias would be good.  The normal way to make such
> a contribution is in a user contributed package.
>
> By the way, R code illustrating the bias is probably not very hard to
> put together.  I believe the bias manifests itself in sample() producing
> values with two different probabilities (instead of all equal
> probabilities).  Those may differ by as much as one part in 2^32.  It's
According to Kellie and Philip, in the attachment of the thread
referenced by Carl, "The maximum ratio of selection probabilities can
get as large as 1.5 if n is just below 2^31".

I?aki
> very difficult to detect a probability difference that small, but if you
> define the partition of values into the high probability values vs the
> low probability values, you can probably detect the difference in a
> feasible simulation.
>
> Duncan Murdoch
>

David Hugh-Jones

2018-Sep-19 13:40 UTC

head link

[Rd] Bias in R's random integers?

On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
>
> I think the analyses are correct, but I doubt if a change to the default
> is likely to be accepted as it would make it more difficult to reproduce
> older results.

I'm a bit alarmed by the logic here. Unbiased sampling seems basic for a
statistical language. As a consumer of R I'd like to think that e.g. my
bootstrapped p values are correct.
Surely if the old results depend on the biased algorithm, then they are
false results?
-- 
Sent from Gmail Mobile

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more apparently analagous threads

R devel - Sep 2018 - Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

Possibly Parallel Threads