Displaying 20 results from an estimated 2000 matches similar to: "Bug in sample()"
2019 Feb 20
0
bias issue in sample() (PR 17494)
Luke,
I'm happy to help with this. Its great to see this get tackled (I've cc'ed
Kelli Ottoboni who helped flag this issue).
I can prepare a patch for the RNGkind related stuff and the doc update.
As for ???, what are your (and others') thoughts about the possibility of
a) a reproducibility API which takes either an R version (or maybe
alternatively a date) and sets the RNGkind
2019 Feb 26
0
bias issue in sample() (PR 17494)
Kirill,
I think some level of collision is actually expected! R uses a 32bit MT
that can produce 2^32 different doubles. The probability for a collision
within a million draws is
> pbirthday(1e6, classes = 2^32)
[1] 1
Greetings
Ralf
On 26.02.19 07:06, Kirill M?ller wrote:
> Gabe
>
>
> As mentioned on Twitter, I think the following behavior should be fixed
> as part of the
2019 Feb 26
1
bias issue in sample() (PR 17494)
Ralf
I don't doubt this is expected with the current implementation, I doubt
the implementation is desirable. Suggesting to turn this to
pbirthday(1e6, classes = 2^53)
## [1] 5.550956e-05
(which is still non-zero, but much less likely to cause confusion.)
Best regards
Kirill
On 26.02.19 10:18, Ralf Stubner wrote:
> Kirill,
>
> I think some level of collision is actually
2019 Feb 26
2
bias issue in sample() (PR 17494)
Gabe
As mentioned on Twitter, I think the following behavior should be fixed
as part of the upcoming changes:
R.version.string
## [1] "R Under development (unstable) (2019-02-25 r76160)"
.Machine$double.digits
## [1] 53
set.seed(123)
RNGkind()
## [1] "Mersenne-Twister" "Inversion"??????? "Rejection"
length(table(runif(1e6)))
## [1] 999863
I don't
2019 Feb 19
2
bias issue in sample() (PR 17494)
Before the next release we really should to sort out the bias issue in
sample() reported by Ottoboni and Stark in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
filed aa a bug report by Duncan Murdoch at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.
Here are two examples of bad behavior through current R-devel:
set.seed(123)
m <- (2/5) * 2^32
2019 Mar 28
0
issue with latest release of R-devel
Could this be related to
"SIGNIFICANT USER-VISIBLE CHANGES
The default method for generating from a discrete uniform distribution
(used in sample(), for instance) has been changed. This addresses the
fact, pointed out by Ottoboni and Stark, that the previous method made
sample() noticeably non-uniform on large populations. See PR#17494 for
a discussion. The previous method can be requested
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 12:09 PM, Philip B. Stark wrote:
> The 53 bits only encode at most 2^{32} possible values, because the
> source of the float is the output of a 32-bit PRNG (the obsolete version
> of MT). 53 bits isn't the relevant number here.
No, two calls to unif_rand() are used. There are two 32 bit values, but
some of the bits are thrown away.
Duncan Murdoch
>
> The
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 12:23 PM, Philip B. Stark wrote:
> No, the 2nd call only happens when m > 2**31. Here's the code:
Yes, you're right. Sorry!
So the ratio really does come close to 2. However, the difference in
probabilities between outcomes is still at most 2^-32 when m is less
than that cutoff. That's not feasible to detect; the only detectable
difference would happen if
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 3:52 PM, Philip B. Stark wrote:
> Hi Duncan--
>
> Nice simulation!
>
> The absolute difference in probabilities is small, but the maximum
> relative difference grows from something negligible to almost 2 as m
> approaches 2**31.
>
> Because the L_1 distance between the uniform distribution on {1, ..., m}
> and what you actually get is large, there
2018 Sep 19
0
Bias in R's random integers?
For a well-tested C algorithm, based on my reading of Lemire, the unbiased
"algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C
standard library in OpenBSD and macOS (as arc4random_uniform), and in the
GNU standard library. Lemire also provides C++ code in the appendix of his
piece for both this and the faster "nearly divisionless" algorithm.
It would be
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 5:57 PM, David Hugh-Jones wrote:
>
> It doesn't seem too hard to come up with plausible ways in which this
> could give bad results. Suppose I sample rows from a large dataset,
> maybe for bootstrapping. Suppose the rows are non-randomly ordered, e.g.
> odd rows are males, even rows are females. Oops! Very non-representative
> sample, bootstrap p values are
2018 Sep 19
2
Bias in R's random integers?
The 53 bits only encode at most 2^{32} possible values, because the source
of the float is the output of a 32-bit PRNG (the obsolete version of MT).
53 bits isn't the relevant number here.
The selection ratios can get close to 2. Computer scientists don't do it
the way R does, for a reason.
Regards,
Philip
On Wed, Sep 19, 2018 at 9:05 AM Duncan Murdoch <murdoch.duncan at
2018 Sep 19
2
Bias in R's random integers?
No, the 2nd call only happens when m > 2**31. Here's the code:
(RNG.c, lines 793ff)
double R_unif_index(double dn)
{
double cut = INT_MAX;
switch(RNG_kind) {
case KNUTH_TAOCP:
case USER_UNIF:
case KNUTH_TAOCP2:
cut = 33554431.0; /* 2^25 - 1 */
break;
default:
break;
}
double u = dn > cut ? ru() : unif_rand();
return floor(dn * u);
}
On Wed, Sep
2018 Sep 19
2
Bias in R's random integers?
A quick point of order here: arguing with Duncan in this forum is
helpful to expose ideas, but probably neither side will convince the
other; eventually, if you want this adopted in core R, you'll need to
convince an R-core member to pursue this fix.
In the meantime, a good, well-tested implementation in a
user-contributed package (presumably written in C for speed) would be
enormously
2018 Sep 19
2
Bias in R's random integers?
It doesn't seem too hard to come up with plausible ways in which this could
give bad results. Suppose I sample rows from a large dataset, maybe for
bootstrapping. Suppose the rows are non-randomly ordered, e.g. odd rows are
males, even rows are females. Oops! Very non-representative sample,
bootstrap p values are garbage.
David
On Wed, 19 Sep 2018 at 21:20, Duncan Murdoch <murdoch.duncan
2018 Sep 19
4
Bias in R's random integers?
Hi Duncan--
Nice simulation!
The absolute difference in probabilities is small, but the maximum relative
difference grows from something negligible to almost 2 as m approaches
2**31.
Because the L_1 distance between the uniform distribution on {1, ..., m}
and what you actually get is large, there have to be test functions whose
expectations are quite different under the two distributions.
2018 Sep 19
2
Bias in R's random integers?
El mi?., 19 sept. 2018 a las 14:43, Duncan Murdoch
(<murdoch.duncan at gmail.com>) escribi?:
>
> On 18/09/2018 5:46 PM, Carl Boettiger wrote:
> > Dear list,
> >
> > It looks to me that R samples random integers using an intuitive but biased
> > algorithm by going from a random number on [0,1) from the PRNG to a random
> > integer, e.g.
> >
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 9:09 AM, I?aki Ucar wrote:
> El mi?., 19 sept. 2018 a las 14:43, Duncan Murdoch
> (<murdoch.duncan at gmail.com>) escribi?:
>>
>> On 18/09/2018 5:46 PM, Carl Boettiger wrote:
>>> Dear list,
>>>
>>> It looks to me that R samples random integers using an intuitive but biased
>>> algorithm by going from a random number on [0,1)
2013 Feb 21
2
[PATCH] xen: consolidate implementations of LOG() macro
arm64 is going to add another one shortly, so take control now.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: keir@xen.org
Cc: jbeulich@suse.com
Cc: tim@xen.org
---
xen/arch/arm/arm32/asm-offsets.c | 8 +-------
xen/arch/x86/x86_64/asm-offsets.c | 8 +-------
xen/include/xen/bitops.h | 7 +++++++
3 files changed, 9 insertions(+), 14 deletions(-)
diff --git
2015 Sep 09
5
Building LLVM and Clang using Clang?
Try as I might I can't seem to get LLVM to bulid using clang/clang++.
No matter what I do it insists on using /usr/bin/cc and /usr/bin/c++
which are gcc. Am I missing something obvious? I vaguely remember some
document describing a stage1 compiler built by your old toolchain and
a stage2 compiler but I can't find the steps to do that any more.
$ CC=/usr/local/bin/clang