thr3ads.net - R devel - [Rd] Bias in R's random integers? [Sep 2018]

If this information is useful, please help other people find it:
Share via:

David Hugh-Jones

2018-Sep-19 13:40 UTC

[Rd] Bias in R's random integers?

On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
>
> I think the analyses are correct, but I doubt if a change to the default
> is likely to be accepted as it would make it more difficult to reproduce
> older results.

I'm a bit alarmed by the logic here. Unbiased sampling seems basic for a
statistical language. As a consumer of R I'd like to think that e.g. my
bootstrapped p values are correct.
Surely if the old results depend on the biased algorithm, then they are
false results?
-- 
Sent from Gmail Mobile

	[[alternative HTML version deleted]]

Ben Bolker

2018-Sep-19 14:03 UTC

head link

[Rd] Bias in R's random integers?

On 2018-09-19 09:40 AM, David Hugh-Jones wrote:> On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch <murdoch.duncan at
gmail.com>
> wrote:
> 
>>
>> I think the analyses are correct, but I doubt if a change to the
default
>> is likely to be accepted as it would make it more difficult to
reproduce
>> older results.
> 
> 
> I'm a bit alarmed by the logic here. Unbiased sampling seems basic for
a
> statistical language. As a consumer of R I'd like to think that e.g. my
> bootstrapped p values are correct.
> Surely if the old results depend on the biased algorithm, then they are
> false results?
> 
   Balancing backward compatibility and correctness is a tough problem
here.  If this goes into base R, what's the best way to do it?  What was
the protocol for migrating away from the "buggy Kinderman-Ramage"
generator, back in the day?   (Version 1.7 was sometime between 2001 and
2004).

  I couldn't find the exact commit in the GitHub mirror: this is related ...

https://github.com/wch/r-source/commit/7ad3044639fd1fe093c655e573fd1a67aa7f55f6#diff-dbcad570d4fb9b7005550ff630543b37



==?normal.kind? can be ?"Kinderman-Ramage"?, ?"Buggy
     Kinderman-Ramage"? (not for ?set.seed?), ?"Ahrens-Dieter"?,
     ?"Box-Muller"?, ?"Inversion"? (the default), or
?"user-supplied"?.
     (For inversion, see the reference in ?qnorm?.)  The
     Kinderman-Ramage generator used in versions prior to 1.7.0 (now
     called ?"Buggy"?) had several approximation errors and should
only
     be used for reproduction of old results.

Duncan Murdoch

2018-Sep-19 16:15 UTC

head link

[Rd] Bias in R's random integers?

On 19/09/2018 9:40 AM, David Hugh-Jones wrote:> 
> 
> On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch <murdoch.duncan at
gmail.com
> <mailto:murdoch.duncan at gmail.com>> wrote:
> 
> 
>     I think the analyses are correct, but I doubt if a change to the
>     default
>     is likely to be accepted as it would make it more difficult to
>     reproduce
>     older results.
> 
> 
> I'm a bit alarmed by the logic here. Unbiased sampling seems basic for
a
> statistical language. As a consumer of R I'd like to think that e.g. my
> bootstrapped p values are correct.
> Surely if the old results depend on the biased algorithm, then they are 
> false results?
All Monte Carlo results contain Monte Carlo error.  Using the biased 
function will have some additional error, but for almost all 
simulations, it will be negligible compared to the Monte Carlo error.  I 
suspect the only simulations where the bias was anywhere near the same 
order of magnitude as the Monte Carlo error would be ones designed with 
this specific code in mind.

Duncan Murdoch

Philip B. Stark

2018-Sep-19 16:18 UTC

head link

[Rd] Bias in R's random integers?

That depends on the number of replications, among other things.

Moreover, because of the bias, the usual formulae for uncertainty in
estimates based on random samples, etc., are incorrect: sample() does not
give a simple random sample.

On Wed, Sep 19, 2018 at 9:15 AM Duncan Murdoch <murdoch.duncan at
gmail.com>
wrote:
> On 19/09/2018 9:40 AM, David Hugh-Jones wrote:
> >
> >
> > On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch <murdoch.duncan at
gmail.com
> > <mailto:murdoch.duncan at gmail.com>> wrote:
> >
> >
> >     I think the analyses are correct, but I doubt if a change to the
> >     default
> >     is likely to be accepted as it would make it more difficult to
> >     reproduce
> >     older results.
> >
> >
> > I'm a bit alarmed by the logic here. Unbiased sampling seems basic
for a
> > statistical language. As a consumer of R I'd like to think that
e.g. my
> > bootstrapped p values are correct.
> > Surely if the old results depend on the biased algorithm, then they
are
> > false results?
>
> All Monte Carlo results contain Monte Carlo error.  Using the biased
> function will have some additional error, but for almost all
> simulations, it will be negligible compared to the Monte Carlo error.  I
> suspect the only simulations where the bias was anywhere near the same
> order of magnitude as the Monte Carlo error would be ones designed with
> this specific code in mind.
>
> Duncan Murdoch
>
>
-- 
Philip B. Stark | Associate Dean, Mathematical and Physical Sciences |
Professor,  Department of Statistics |
University of California
Berkeley, CA 94720-3860 | 510-394-5077 | statistics.berkeley.edu/~stark |
@philipbstark

	[[alternative HTML version deleted]]

Paul Gilbert

2018-Sep-20 16:30 UTC

head link

[Rd] Bias in R's random integers?

On 09/19/2018 10:03 AM, Ben Bolker wrote:
...>     Balancing backward compatibility and correctness is a tough problem
> here.  
I think improvements in the RNG is a situation where backward 
compatibility is not really going to be lost, because people can specify 
the old generator, they just will not get it by default. My opinion is 
that the default needs to generally be the best option available because 
too many people will be expecting that, or not know better, in which 
case that is what they should get.

There are only two small problems that occur to me:

1/ Researchers that want to have reproducible results (all I hope) need 
to be aware the change has happened. In theory they should have recorded 
the RNG they were using, along with the seed (and, BTW, the number of 
nodes if they generate with a parallel generator). If they have not done 
that then they can figure out the RNG from knowing what version of R 
they used. If they haven't recorded that then they can figure it out by 
some experimentation and knowing roughly when they did the research. If 
none of this works then the research probably should be lost.

As an exercise, researchers might also want to experiment with whether 
the new default qualitatively changes their results. That might lead to 
publishable research, so no one should complain.

2/ Package maintainers that have used the default RNG to generate tests 
may need to change their tests to specify the old generator, or modify 
results used for comparisons in the tests. Since package testing is 
usually for code checking rather than statistical results, not using the 
best available generator is not usually an issue.

Most of my own package testing already specifies the generator, lots 
uses "buggy Kinderman-Ramage" because tests were set up a long time
ago.
I will have to change package setRNG which warns when the default 
generator changes. (This warning is intentional because I was bitten 
badly by a small change in the S generator circa 1990.)

> If this goes into base R, what's the best way to do it?  What was
> the protocol for migrating away from the "buggy Kinderman-Ramage"
> generator, back in the day?   (Version 1.7 was sometime between 2001 and
> 2004).
I think there may have been a change in R 0.99 too. At least my notes 
suggest that the code I changed for  R 1.7.0 had worked with the default 
generator from R 0.99 to 1.6.2.

I don't recall the protocol, I think it just happened and was announced 
in the NEWS. (Has this protocol changed?) The ramification for me was 
that I had to go through all of my packages' testing and change the name 
of the explicitly specified RNG to "buggy Kinderman-Ramage".

Perhaps there does need to be a protocol for testing before release. 
When my package setRNG fails then many of my other packages will also 
fail because they depend on it. This is a simple fix but reverse 
dependencies may make it look like lots of things are broken.

Paul Gilbert
>    I couldn't find the exact commit in the GitHub mirror: this is
related ...
> 
>
https://github.com/wch/r-source/commit/7ad3044639fd1fe093c655e573fd1a67aa7f55f6#diff-dbcad570d4fb9b7005550ff630543b37
> 
> 
> 
> ==> ?normal.kind? can be ?"Kinderman-Ramage"?, ?"Buggy
>       Kinderman-Ramage"? (not for ?set.seed?),
?"Ahrens-Dieter"?,
>       ?"Box-Muller"?, ?"Inversion"? (the default), or
?"user-supplied"?.
>       (For inversion, see the reference in ?qnorm?.)  The
>       Kinderman-Ramage generator used in versions prior to 1.7.0 (now
>       called ?"Buggy"?) had several approximation errors and
should only
>       be used for reproduction of old results.
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Reasonably Related Threads

Search for more seemingly similar threads

R devel - Sep 2018 - Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

Reasonably Related Threads