similar to: Bug in sample()

Displaying 20 results from an estimated 2000 matches similar to: "Bug in sample()"

2019 Feb 20
0
bias issue in sample() (PR 17494)
Luke, I'm happy to help with this. Its great to see this get tackled (I've cc'ed Kelli Ottoboni who helped flag this issue). I can prepare a patch for the RNGkind related stuff and the doc update. As for ???, what are your (and others') thoughts about the possibility of a) a reproducibility API which takes either an R version (or maybe alternatively a date) and sets the RNGkind
2019 Feb 26
0
bias issue in sample() (PR 17494)
Kirill, I think some level of collision is actually expected! R uses a 32bit MT that can produce 2^32 different doubles. The probability for a collision within a million draws is > pbirthday(1e6, classes = 2^32) [1] 1 Greetings Ralf On 26.02.19 07:06, Kirill M?ller wrote: > Gabe > > > As mentioned on Twitter, I think the following behavior should be fixed > as part of the
2019 Feb 26
1
bias issue in sample() (PR 17494)
Ralf I don't doubt this is expected with the current implementation, I doubt the implementation is desirable. Suggesting to turn this to pbirthday(1e6, classes = 2^53) ## [1] 5.550956e-05 (which is still non-zero, but much less likely to cause confusion.) Best regards Kirill On 26.02.19 10:18, Ralf Stubner wrote: > Kirill, > > I think some level of collision is actually
2019 Feb 26
2
bias issue in sample() (PR 17494)
Gabe As mentioned on Twitter, I think the following behavior should be fixed as part of the upcoming changes: R.version.string ## [1] "R Under development (unstable) (2019-02-25 r76160)" .Machine$double.digits ## [1] 53 set.seed(123) RNGkind() ## [1] "Mersenne-Twister" "Inversion"??????? "Rejection" length(table(runif(1e6))) ## [1] 999863 I don't
2019 Feb 19
2
bias issue in sample() (PR 17494)
Before the next release we really should to sort out the bias issue in sample() reported by Ottoboni and Stark in https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and filed aa a bug report by Duncan Murdoch at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494. Here are two examples of bad behavior through current R-devel: set.seed(123) m <- (2/5) * 2^32
2019 Mar 28
0
issue with latest release of R-devel
Could this be related to "SIGNIFICANT USER-VISIBLE CHANGES The default method for generating from a discrete uniform distribution (used in sample(), for instance) has been changed. This addresses the fact, pointed out by Ottoboni and Stark, that the previous method made sample() noticeably non-uniform on large populations. See PR#17494 for a discussion. The previous method can be requested
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 12:09 PM, Philip B. Stark wrote: > The 53 bits only encode at most 2^{32} possible values, because the > source of the float is the output of a 32-bit PRNG (the obsolete version > of MT). 53 bits isn't the relevant number here. No, two calls to unif_rand() are used. There are two 32 bit values, but some of the bits are thrown away. Duncan Murdoch > > The
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 12:23 PM, Philip B. Stark wrote: > No, the 2nd call only happens when m > 2**31. Here's the code: Yes, you're right. Sorry! So the ratio really does come close to 2. However, the difference in probabilities between outcomes is still at most 2^-32 when m is less than that cutoff. That's not feasible to detect; the only detectable difference would happen if
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 3:52 PM, Philip B. Stark wrote: > Hi Duncan-- > > Nice simulation! > > The absolute difference in probabilities is small, but the maximum > relative difference grows from something negligible to almost 2 as m > approaches 2**31. > > Because the L_1 distance between the uniform distribution on {1, ..., m} > and what you actually get is large, there
2018 Sep 19
0
Bias in R's random integers?
For a well-tested C algorithm, based on my reading of Lemire, the unbiased "algorithm 3" in https://arxiv.org/abs/1805.10941 is part already of the C standard library in OpenBSD and macOS (as arc4random_uniform), and in the GNU standard library. Lemire also provides C++ code in the appendix of his piece for both this and the faster "nearly divisionless" algorithm. It would be
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 5:57 PM, David Hugh-Jones wrote: > > It doesn't seem too hard to come up with plausible ways in which this > could give bad results. Suppose I sample rows from a large dataset, > maybe for bootstrapping. Suppose the rows are non-randomly ordered, e.g. > odd rows are males, even rows are females. Oops! Very non-representative > sample, bootstrap p values are
2018 Sep 19
2
Bias in R's random integers?
The 53 bits only encode at most 2^{32} possible values, because the source of the float is the output of a 32-bit PRNG (the obsolete version of MT). 53 bits isn't the relevant number here. The selection ratios can get close to 2. Computer scientists don't do it the way R does, for a reason. Regards, Philip On Wed, Sep 19, 2018 at 9:05 AM Duncan Murdoch <murdoch.duncan at
2018 Sep 19
2
Bias in R's random integers?
No, the 2nd call only happens when m > 2**31. Here's the code: (RNG.c, lines 793ff) double R_unif_index(double dn) { double cut = INT_MAX; switch(RNG_kind) { case KNUTH_TAOCP: case USER_UNIF: case KNUTH_TAOCP2: cut = 33554431.0; /* 2^25 - 1 */ break; default: break; } double u = dn > cut ? ru() : unif_rand(); return floor(dn * u); } On Wed, Sep
2018 Sep 19
2
Bias in R's random integers?
A quick point of order here: arguing with Duncan in this forum is helpful to expose ideas, but probably neither side will convince the other; eventually, if you want this adopted in core R, you'll need to convince an R-core member to pursue this fix. In the meantime, a good, well-tested implementation in a user-contributed package (presumably written in C for speed) would be enormously
2018 Sep 19
2
Bias in R's random integers?
It doesn't seem too hard to come up with plausible ways in which this could give bad results. Suppose I sample rows from a large dataset, maybe for bootstrapping. Suppose the rows are non-randomly ordered, e.g. odd rows are males, even rows are females. Oops! Very non-representative sample, bootstrap p values are garbage. David On Wed, 19 Sep 2018 at 21:20, Duncan Murdoch <murdoch.duncan
2018 Sep 19
4
Bias in R's random integers?
Hi Duncan-- Nice simulation! The absolute difference in probabilities is small, but the maximum relative difference grows from something negligible to almost 2 as m approaches 2**31. Because the L_1 distance between the uniform distribution on {1, ..., m} and what you actually get is large, there have to be test functions whose expectations are quite different under the two distributions.
2018 Sep 19
2
Bias in R's random integers?
El mi?., 19 sept. 2018 a las 14:43, Duncan Murdoch (<murdoch.duncan at gmail.com>) escribi?: > > On 18/09/2018 5:46 PM, Carl Boettiger wrote: > > Dear list, > > > > It looks to me that R samples random integers using an intuitive but biased > > algorithm by going from a random number on [0,1) from the PRNG to a random > > integer, e.g. > >
2018 Sep 19
0
Bias in R's random integers?
On 19/09/2018 9:09 AM, I?aki Ucar wrote: > El mi?., 19 sept. 2018 a las 14:43, Duncan Murdoch > (<murdoch.duncan at gmail.com>) escribi?: >> >> On 18/09/2018 5:46 PM, Carl Boettiger wrote: >>> Dear list, >>> >>> It looks to me that R samples random integers using an intuitive but biased >>> algorithm by going from a random number on [0,1)
2013 Feb 21
2
[PATCH] xen: consolidate implementations of LOG() macro
arm64 is going to add another one shortly, so take control now. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: keir@xen.org Cc: jbeulich@suse.com Cc: tim@xen.org --- xen/arch/arm/arm32/asm-offsets.c | 8 +------- xen/arch/x86/x86_64/asm-offsets.c | 8 +------- xen/include/xen/bitops.h | 7 +++++++ 3 files changed, 9 insertions(+), 14 deletions(-) diff --git
2015 Sep 09
5
Building LLVM and Clang using Clang?
Try as I might I can't seem to get LLVM to bulid using clang/clang++. No matter what I do it insists on using /usr/bin/cc and /usr/bin/c++ which are gcc. Am I missing something obvious? I vaguely remember some document describing a stage1 compiler built by your old toolchain and a stage2 compiler but I can't find the steps to do that any more. $ CC=/usr/local/bin/clang