Displaying 20 results from an estimated 1000 matches similar to: "Bias in R's random integers?"
2018 Sep 19
2
Bias in R's random integers?
El mi?., 19 sept. 2018 a las 14:43, Duncan Murdoch
(<murdoch.duncan at gmail.com>) escribi?:
>
> On 18/09/2018 5:46 PM, Carl Boettiger wrote:
> > Dear list,
> >
> > It looks to me that R samples random integers using an intuitive but biased
> > algorithm by going from a random number on [0,1) from the PRNG to a random
> > integer, e.g.
> >
2018 Sep 19
2
Bias in R's random integers?
The 53 bits only encode at most 2^{32} possible values, because the source
of the float is the output of a 32-bit PRNG (the obsolete version of MT).
53 bits isn't the relevant number here.
The selection ratios can get close to 2. Computer scientists don't do it
the way R does, for a reason.
Regards,
Philip
On Wed, Sep 19, 2018 at 9:05 AM Duncan Murdoch <murdoch.duncan at
2018 Sep 19
2
Bias in R's random integers?
No, the 2nd call only happens when m > 2**31. Here's the code:
(RNG.c, lines 793ff)
double R_unif_index(double dn)
{
double cut = INT_MAX;
switch(RNG_kind) {
case KNUTH_TAOCP:
case USER_UNIF:
case KNUTH_TAOCP2:
cut = 33554431.0; /* 2^25 - 1 */
break;
default:
break;
}
double u = dn > cut ? ru() : unif_rand();
return floor(dn * u);
}
On Wed, Sep
2018 Sep 19
4
Bias in R's random integers?
Hi Duncan--
Nice simulation!
The absolute difference in probabilities is small, but the maximum relative
difference grows from something negligible to almost 2 as m approaches
2**31.
Because the L_1 distance between the uniform distribution on {1, ..., m}
and what you actually get is large, there have to be test functions whose
expectations are quite different under the two distributions.
2018 Sep 19
2
Bias in R's random integers?
It doesn't seem too hard to come up with plausible ways in which this could
give bad results. Suppose I sample rows from a large dataset, maybe for
bootstrapping. Suppose the rows are non-randomly ordered, e.g. odd rows are
males, even rows are females. Oops! Very non-representative sample,
bootstrap p values are garbage.
David
On Wed, 19 Sep 2018 at 21:20, Duncan Murdoch <murdoch.duncan
2018 Sep 19
2
Bias in R's random integers?
A quick point of order here: arguing with Duncan in this forum is
helpful to expose ideas, but probably neither side will convince the
other; eventually, if you want this adopted in core R, you'll need to
convince an R-core member to pursue this fix.
In the meantime, a good, well-tested implementation in a
user-contributed package (presumably written in C for speed) would be
enormously
2018 Sep 19
0
Bias in R's random integers?
On 18/09/2018 5:46 PM, Carl Boettiger wrote:
> Dear list,
>
> It looks to me that R samples random integers using an intuitive but biased
> algorithm by going from a random number on [0,1) from the PRNG to a random
> integer, e.g.
> https://github.com/wch/r-source/blob/tags/R-3-5-1/src/main/RNG.c#L808
>
> Many other languages use various rejection sampling approaches
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 9:09 AM, I?aki Ucar wrote:
> El mi?., 19 sept. 2018 a las 14:43, Duncan Murdoch
> (<murdoch.duncan at gmail.com>) escribi?:
>>
>> On 18/09/2018 5:46 PM, Carl Boettiger wrote:
>>> Dear list,
>>>
>>> It looks to me that R samples random integers using an intuitive but biased
>>> algorithm by going from a random number on [0,1)
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 12:09 PM, Philip B. Stark wrote:
> The 53 bits only encode at most 2^{32} possible values, because the
> source of the float is the output of a 32-bit PRNG (the obsolete version
> of MT). 53 bits isn't the relevant number here.
No, two calls to unif_rand() are used. There are two 32 bit values, but
some of the bits are thrown away.
Duncan Murdoch
>
> The
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 12:23 PM, Philip B. Stark wrote:
> No, the 2nd call only happens when m > 2**31. Here's the code:
Yes, you're right. Sorry!
So the ratio really does come close to 2. However, the difference in
probabilities between outcomes is still at most 2^-32 when m is less
than that cutoff. That's not feasible to detect; the only detectable
difference would happen if
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 3:52 PM, Philip B. Stark wrote:
> Hi Duncan--
>
> Nice simulation!
>
> The absolute difference in probabilities is small, but the maximum
> relative difference grows from something negligible to almost 2 as m
> approaches 2**31.
>
> Because the L_1 distance between the uniform distribution on {1, ..., m}
> and what you actually get is large, there
2018 Sep 19
0
Bias in R's random integers?
For a well-tested C algorithm, based on my reading of Lemire, the unbiased
"algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C
standard library in OpenBSD and macOS (as arc4random_uniform), and in the
GNU standard library. Lemire also provides C++ code in the appendix of his
piece for both this and the faster "nearly divisionless" algorithm.
It would be
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 5:57 PM, David Hugh-Jones wrote:
>
> It doesn't seem too hard to come up with plausible ways in which this
> could give bad results. Suppose I sample rows from a large dataset,
> maybe for bootstrapping. Suppose the rows are non-randomly ordered, e.g.
> odd rows are males, even rows are females. Oops! Very non-representative
> sample, bootstrap p values are
2018 Sep 20
5
Bias in R's random integers?
On 9/20/18 1:43 AM, Carl Boettiger wrote:
> For a well-tested C algorithm, based on my reading of Lemire, the unbiased
> "algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C
> standard library in OpenBSD and macOS (as arc4random_uniform), and in the
> GNU standard library. Lemire also provides C++ code in the appendix of his
> piece for both this and
2018 Sep 20
4
Bias in R's random integers?
Hello,
On Thursday, September 20, 2018 11:15:04 AM EDT Duncan Murdoch wrote:
> On 20/09/2018 6:59 AM, Ralf Stubner wrote:
> > On 9/20/18 1:43 AM, Carl Boettiger wrote:
> >> For a well-tested C algorithm, based on my reading of Lemire, the
> >> unbiased "algorithm 3" in https://arxiv.org/abs/1805.10941 is part
> >> already of the C standard library in
2018 Sep 20
0
Bias in R's random integers?
On 20/09/2018 6:59 AM, Ralf Stubner wrote:
> On 9/20/18 1:43 AM, Carl Boettiger wrote:
>> For a well-tested C algorithm, based on my reading of Lemire, the unbiased
>> "algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C
>> standard library in OpenBSD and macOS (as arc4random_uniform), and in the
>> GNU standard library. Lemire also
2018 Sep 19
4
Bias in R's random integers?
On Wed, 19 Sep 2018 at 13:43, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
>
> I think the analyses are correct, but I doubt if a change to the default
> is likely to be accepted as it would make it more difficult to reproduce
> older results.
I'm a bit alarmed by the logic here. Unbiased sampling seems basic for a
statistical language. As a consumer of R I'd
2018 Sep 21
0
Bias in R's random integers?
Hello,
Top posting. Several people have asked about the code to replicate my
results. I have cleaned up the code to remove an x/y coordinate bias for
displaying the results directly on a 640 x 480 VGA adapter. You can find the
code here:
http://people.redhat.com/sgrubb/files/vseq.c
To collect R samples:
X <- runif(10000, min = 0, max = 65535)
write.table(X, file =
2018 Sep 21
3
Bias in R's random integers?
Not sure what should happen theoretically for the code in vseq.c, but
I see the same pattern with the R generators I tried (default,
Super-Duper, and L'Ecuyer) and with with bash $RANDOM using
N <- 10000
X1 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'", intern = TRUE)))
X2 <- replicate(N, as.integer(system("bash -c 'echo $RANDOM'",
2018 Sep 21
1
Bias in R's random integers?
On 9/20/18 5:15 PM, Duncan Murdoch wrote:
> On 20/09/2018 6:59 AM, Ralf Stubner wrote:
>> It is difficult to do this in a package, since R does not provide access
>> to the random bits generated by the RNG. Only a float in (0,1) is
>> available via unif_rand().
>
> I believe it is safe to multiply the unif_rand() value by 2^32, and take
> the whole number part as an