thr3ads.net - R devel - [Rd] Bias in R's random integers? [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Ralf Stubner

2018-Sep-20 10:59 UTC

[Rd] Bias in R's random integers?

On 9/20/18 1:43 AM, Carl Boettiger wrote:> For a well-tested C algorithm, based on my reading of Lemire, the unbiased
> "algorithm 3" in https://arxiv.org/abs/1805.10941 is part already
of the C
> standard library in OpenBSD and macOS (as arc4random_uniform), and in the
> GNU standard library.  Lemire also provides C++ code in the appendix of his
> piece for both this and the faster "nearly divisionless"
algorithm.
> 
> It would be excellent if any R core members were interested in considering
> bindings to these algorithms as a patch, or might express expectations for
> how that patch would have to operate (e.g. re Duncan's comment about
> non-integer arguments to sample size).  Otherwise, an R package binding
> seems like a good starting point, but I'm not the right volunteer.It is difficult to do this in a package, since R does not provide access
to the random bits generated by the RNG. Only a float in (0,1) is
available via unif_rand(). However, if one is willing to use an external
RNG, it is of course possible. After reading about Lemire's work [1], I
had planned to integrate such an unbiased sampling scheme into the dqrng
package, which I have now started. [2]

Using Duncan's example, the results look much better:
> library(dqrng)
> m <- (2/5)*2^32
> y <- dqsample(m, 1000000, replace = TRUE)
> table(y %% 2)
     0      1
500252 499748

Currently I am taking the other interpretation of "truncated":
> table(dqsample(2.5, 1000000, replace = TRUE))
     0      1
499894 500106

I will adjust this to whatever is decided for base R.


However, there is currently neither long vector nor weighted sampling
support. And the performance without replacement is quite bad compared
to R's algorithm with hashing.

cheerio
ralf

[1] via http://www.pcg-random.org/posts/bounded-rands.html
[2] https://github.com/daqana/dqrng/tree/feature/sample

-- 
Ralf Stubner
Senior Software Engineer / Trainer

daqana GmbH
Dortustra?e 48
14467 Potsdam

T: +49 331 23 61 93 11
F: +49 331 23 61 93 90
M: +49 162 20 91 196
Mail: ralf.stubner at daqana.com

Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr.: DE300072622
Gesch?ftsf?hrer: Prof. Dr. Dr. Karl-Kuno Kunze


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20180920/cce85e63/attachment.sig>

Duncan Murdoch

2018-Sep-20 15:15 UTC

head link

[Rd] Bias in R's random integers?

On 20/09/2018 6:59 AM, Ralf Stubner wrote:> On 9/20/18 1:43 AM, Carl Boettiger wrote:
>> For a well-tested C algorithm, based on my reading of Lemire, the
unbiased
>> "algorithm 3" in https://arxiv.org/abs/1805.10941 is part
already of the C
>> standard library in OpenBSD and macOS (as arc4random_uniform), and in
the
>> GNU standard library.  Lemire also provides C++ code in the appendix of
his
>> piece for both this and the faster "nearly divisionless"
algorithm.
>>
>> It would be excellent if any R core members were interested in
considering
>> bindings to these algorithms as a patch, or might express expectations
for
>> how that patch would have to operate (e.g. re Duncan's comment
about
>> non-integer arguments to sample size).  Otherwise, an R package binding
>> seems like a good starting point, but I'm not the right volunteer.
> It is difficult to do this in a package, since R does not provide access
> to the random bits generated by the RNG. Only a float in (0,1) is
> available via unif_rand(). 
I believe it is safe to multiply the unif_rand() value by 2^32, and take 
the whole number part as an unsigned 32 bit integer.  Depending on the 
RNG in use, that will give at least 25 random bits.  (The low order bits 
are the questionable ones.  25 is just a guess, not a guarantee.)

However, if one is willing to use an external> RNG, it is of course possible. After reading about Lemire's work [1], I
> had planned to integrate such an unbiased sampling scheme into the dqrng
> package, which I have now started. [2]
> 
> Using Duncan's example, the results look much better:
> 
>> library(dqrng)
>> m <- (2/5)*2^32
>> y <- dqsample(m, 1000000, replace = TRUE)
>> table(y %% 2)
> 
>       0      1
> 500252 499748
Another useful diagnostic is

   plot(density(y[y %% 2 == 0]))

Obviously that should give a more or less uniform density, but for 
values near m, the default sample() gives some nice pretty pictures of 
quite non-uniform densities.

By the way, there are actually quite a few examples of very large m 
besides m = (2/5)*2^32 where performance of sample() is noticeably bad. 
You'll see problems in y %% 2 for any integer a > 1 with m = 2/(1 + 2a) 
* 2^32, problems in y %% 3 for m = 3/(1 + 3a)*2^32 or m = 3/(2 + 
3a)*2^32, etc.

So perhaps I'm starting to be convinced that the default sample() should 
be fixed.

Duncan Murdoch

> 
> Currently I am taking the other interpretation of "truncated":
> 
>> table(dqsample(2.5, 1000000, replace = TRUE))
> 
>       0      1
> 499894 500106
> 
> I will adjust this to whatever is decided for base R.
> 
> 
> However, there is currently neither long vector nor weighted sampling
> support. And the performance without replacement is quite bad compared
> to R's algorithm with hashing.
> 
> cheerio
> ralf
> 
> [1] via http://www.pcg-random.org/posts/bounded-rands.html
> [2] https://github.com/daqana/dqrng/tree/feature/sample
> 
> 
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Hervé Pagès

2018-Sep-20 17:57 UTC

head link

[Rd] Bias in R's random integers?

Hi,

Note that it wouldn't be the first time that sample() changes behavior
in a non-backward compatible way:

   https://stat.ethz.ch/pipermail/r-devel/2012-October/065049.html

Cheers,
H.


On 09/20/2018 08:15 AM, Duncan Murdoch wrote:> On 20/09/2018 6:59 AM, Ralf Stubner wrote:
>> On 9/20/18 1:43 AM, Carl Boettiger wrote:
>>> For a well-tested C algorithm, based on my reading of Lemire, the 
>>> unbiased
>>> "algorithm 3" in 
>>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1805.10941&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=TtofIDsvWasBZGzOl9J0kBQnJMksr2Rg3u1l8CM5-qE&e=
>>> is part already of the C
>>> standard library in OpenBSD and macOS (as arc4random_uniform), and
in
>>> the
>>> GNU standard library.? Lemire also provides C++ code in the
appendix
>>> of his
>>> piece for both this and the faster "nearly divisionless"
algorithm.
>>>
>>> It would be excellent if any R core members were interested in 
>>> considering
>>> bindings to these algorithms as a patch, or might express 
>>> expectations for
>>> how that patch would have to operate (e.g. re Duncan's comment
about
>>> non-integer arguments to sample size).? Otherwise, an R package
binding
>>> seems like a good starting point, but I'm not the right
volunteer.
>> It is difficult to do this in a package, since R does not provide
access
>> to the random bits generated by the RNG. Only a float in (0,1) is
>> available via unif_rand(). 
> 
> I believe it is safe to multiply the unif_rand() value by 2^32, and take 
> the whole number part as an unsigned 32 bit integer.? Depending on the 
> RNG in use, that will give at least 25 random bits.? (The low order bits 
> are the questionable ones.? 25 is just a guess, not a guarantee.)
> 
> However, if one is willing to use an external
>> RNG, it is of course possible. After reading about Lemire's work
[1], I
>> had planned to integrate such an unbiased sampling scheme into the
dqrng
>> package, which I have now started. [2]
>>
>> Using Duncan's example, the results look much better:
>>
>>> library(dqrng)
>>> m <- (2/5)*2^32
>>> y <- dqsample(m, 1000000, replace = TRUE)
>>> table(y %% 2)
>>
>> ????? 0????? 1
>> 500252 499748
> 
> Another useful diagnostic is
> 
>  ? plot(density(y[y %% 2 == 0]))
> 
> Obviously that should give a more or less uniform density, but for 
> values near m, the default sample() gives some nice pretty pictures of 
> quite non-uniform densities.
> 
> By the way, there are actually quite a few examples of very large m 
> besides m = (2/5)*2^32 where performance of sample() is noticeably bad. 
> You'll see problems in y %% 2 for any integer a > 1 with m = 2/(1 +
2a)
> * 2^32, problems in y %% 3 for m = 3/(1 + 3a)*2^32 or m = 3/(2 + 
> 3a)*2^32, etc.
> 
> So perhaps I'm starting to be convinced that the default sample()
should
> be fixed.
> 
> Duncan Murdoch
> 
> 
>>
>> Currently I am taking the other interpretation of
"truncated":
>>
>>> table(dqsample(2.5, 1000000, replace = TRUE))
>>
>> ????? 0????? 1
>> 499894 500106
>>
>> I will adjust this to whatever is decided for base R.
>>
>>
>> However, there is currently neither long vector nor weighted sampling
>> support. And the performance without replacement is quite bad compared
>> to R's algorithm with hashing.
>>
>> cheerio
>> ralf
>>
>> [1] via 
>>
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pcg-2Drandom.org_posts_bounded-2Drands.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=OlX-dzwoOeFlod3Gofa_1TQaZwmjsCH9C9v3lM5Y2rY&e=
>>
>> [2] 
>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_daqana_dqrng_tree_feature_sample&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=DNaSqRCy89Hvbg1G0SpyEL0kkr9_RqWXi9pTy75V32M&e=
>>
>>
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=WOx4NyeYmWxpDG3tBRQ9-_Y3_7YAlKUKOP6gZLs0BrQ&e=
>>
>>
> 
> ______________________________________________
> R-devel at r-project.org mailing list
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=tVt5ARiRzaOYr7BgOc0nC_hDq80BUkAUKNwcowN5W1k&s=WOx4NyeYmWxpDG3tBRQ9-_Y3_7YAlKUKOP6gZLs0BrQ&e=
> 
-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Steve Grubb

2018-Sep-20 21:09 UTC

head link

[Rd] Bias in R's random integers?

Hello,

On Thursday, September 20, 2018 11:15:04 AM EDT Duncan Murdoch
wrote:> On 20/09/2018 6:59 AM, Ralf Stubner wrote:
> > On 9/20/18 1:43 AM, Carl Boettiger wrote:
> >> For a well-tested C algorithm, based on my reading of Lemire, the
> >> unbiased "algorithm 3" in
https://arxiv.org/abs/1805.10941 is part
> >> already of the C standard library in OpenBSD and macOS (as
> >> arc4random_uniform), and in the GNU standard library.  Lemire also
> >> provides C++ code in the appendix of his piece for both this and
the
> >> faster "nearly divisionless" algorithm.
> >> 
> >> It would be excellent if any R core members were interested in
> >> considering bindings to these algorithms as a patch, or might
express
> >> expectations for how that patch would have to operate (e.g. re
Duncan's
> >> comment about non-integer arguments to sample size).  Otherwise,
an R
> >> package binding seems like a good starting point, but I'm not
the right
> >> volunteer.
> > 
> > It is difficult to do this in a package, since R does not provide
access
> > to the random bits generated by the RNG. Only a float in (0,1) is
> > available via unif_rand().
> 
> I believe it is safe to multiply the unif_rand() value by 2^32, and take
> the whole number part as an unsigned 32 bit integer.  Depending on the
> RNG in use, that will give at least 25 random bits.  (The low order bits
> are the questionable ones.  25 is just a guess, not a guarantee.)
> 
> However, if one is willing to use an external
> 
> > RNG, it is of course possible. After reading about Lemire's work
[1], I
> > had planned to integrate such an unbiased sampling scheme into the
dqrng
> > package, which I have now started. [2]
> > 
> > Using Duncan's example, the results look much better:
> >> library(dqrng)
> >> m <- (2/5)*2^32
> >> y <- dqsample(m, 1000000, replace = TRUE)
> >> table(y %% 2)
> >> 
> >       0      1
> > 
> > 500252 499748
> 
> Another useful diagnostic is
> 
>    plot(density(y[y %% 2 == 0]))
> 
> Obviously that should give a more or less uniform density, but for
> values near m, the default sample() gives some nice pretty pictures of
> quite non-uniform densities.
> 
> By the way, there are actually quite a few examples of very large m
> besides m = (2/5)*2^32 where performance of sample() is noticeably bad.
> You'll see problems in y %% 2 for any integer a > 1 with m = 2/(1 +
2a)
> * 2^32, problems in y %% 3 for m = 3/(1 + 3a)*2^32 or m = 3/(2 +
> 3a)*2^32, etc.
> 
> So perhaps I'm starting to be convinced that the default sample()
should
> be fixed.
I find this discussion fascinating. I normally test random numbers in 
different languages every now and again using various methods. One simple 
check that I do is to use Michal Zalewski's method when he studied Strange 
Attractors and Initial TCP/IP Sequence Numbers:

http://lcamtuf.coredump.cx/newtcp/
https://pdfs.semanticscholar.org/
adb7/069984e3fa48505cd5081ec118ccb95529a3.pdf

The technique works by mapping the dynamics of the generated numbers into a 
three-dimensional phase space. This is then plotted in a graph so that you 
can visually see if something odd is going on.

I used   runif(10000, min = 0, max = 65535)  to get a set of numbers. This is 
the resulting plot that was generated from R's numbers using this technique:

http://people.redhat.com/sgrubb/files/r-random.jpg

And for comparison this was generated by collecting the same number of 
samples from the bash shell:

http://people.redhat.com/sgrubb/files/bash-random.jpg

The net result is that it shows some banding in the R generated random 
numbers where bash has uniform random numbers with no discernible pattern.

Best Regards,
-Steve

Ralf Stubner

2018-Sep-21 13:15 UTC

head link

[Rd] Bias in R's random integers?

On 9/20/18 5:15 PM, Duncan Murdoch wrote:> On 20/09/2018 6:59 AM, Ralf Stubner wrote:
>> It is difficult to do this in a package, since R does not provide
access
>> to the random bits generated by the RNG. Only a float in (0,1) is
>> available via unif_rand(). 
> 
> I believe it is safe to multiply the unif_rand() value by 2^32, and take
> the whole number part as an unsigned 32 bit integer.? Depending on the
> RNG in use, that will give at least 25 random bits.? (The low order bits
> are the questionable ones.? 25 is just a guess, not a guarantee.)
Right, the RNGs in R produce no more than 32bits, so the conversion to a
double can be reverted. If we ignore those RNGs that produce less than
32bits for the moment, then the attached file contains a sample
implementation (without long vectors, weighted sampling or hashing). It
uses Rcpp for convenience, but I have tried to keep the C++ low.
Interesting results:

The results for "simple" sampling are the same.
> set.seed(42)
> sample.int(6, 10, replace = TRUE) [1] 6 6 2 5 4 4 5 1 4 5
> sample.int(100, 10) [1] 46 72 92 25 45 90 98 11 44 51
> set.seed(42)
> sample_int(6, 10, replace = TRUE) [1] 6 6 2 5 4 4 5 1 4 5
> sample_int(100, 10) [1] 46 72 92 25 45 90 98 11 44 51


But there is no bias with the alternative method:
> m <- ceiling((2/5)*2^32)
> set.seed(42)
> x <- sample.int(m, 1000000, replace = TRUE)
> table(x %% 2)
     0      1
467768 532232
> set.seed(42)
> y <- sample_int(m, 1000000, replace = TRUE)
> table(y %% 2)
     0      1
500586 499414


The differences are also visible when sampling only a few values from
'm' possible values:
> set.seed(42)
> sample.int(m, 6, replace = TRUE)[1] 1571624817 1609883303  491583978 1426698159 1102510407  891800051
> set.seed(42)
> sample_int(m, 6, replace = TRUE)[1]  491583978 1426698159 1102510407  891800051 1265449090  231355453


When sampling from 'm', performance is not so good since we often have
to get a second random number:
> bench::mark(orig = sample.int(m, 1000000, replace = TRUE),+             new  = sample_int(m, 1000000, replace = TRUE),
+             check = FALSE)
# A tibble: 2 x 14
  expression     min    mean  median   max `itr/sec` mem_alloc  n_gc n_itr
  <chr>      <bch:t> <bch:t> <bch:t> <bch>    
<dbl> <bch:byt> <dbl> <int>
1 orig        8.15ms  8.67ms  8.43ms  10ms     115.     3.82MB     4    52
2 new        25.21ms 25.58ms 25.45ms  27ms      39.1    3.82MB     2    18
# ... with 5 more variables: total_time <bch:tm>, result <list>,
memory
<list>,
#   time <list>, gc <list>


When sampling from fewer values, the difference is much less pronounced:
> bench::mark(orig = sample.int(6, 1000000, replace = TRUE),+             new  = sample_int(6, 1000000, replace = TRUE),
+             check = FALSE)
# A tibble: 2 x 14
  expression     min    mean  median     max `itr/sec` mem_alloc  n_gc n_itr
  <chr>      <bch:t> <bch:t> <bch:t> <bch:t>    
<dbl> <bch:byt> <dbl> <int>
1 orig        8.14ms  8.44ms  8.29ms  9.58ms     118.     3.82MB     4    54
2 new        11.13ms 11.66ms 11.23ms 12.98ms      85.8    3.82MB     3    39
# ... with 5 more variables: total_time <bch:tm>, result <list>,
memory
<list>,
#   time <list>, gc <list>

> Another useful diagnostic is
> 
> ? plot(density(y[y %% 2 == 0]))
> 
> Obviously that should give a more or less uniform density, but for
> values near m, the default sample() gives some nice pretty pictures of
> quite non-uniform densities.
Indeed. Adding/subtracting numbers < 10 to/from 'm'  gives
"interesting"
curves.
> By the way, there are actually quite a few examples of very large m
> besides m = (2/5)*2^32 where performance of sample() is noticeably bad.
> You'll see problems in y %% 2 for any integer a > 1 with m = 2/(1 +
2a)
> * 2^32, problems in y %% 3 for m = 3/(1 + 3a)*2^32 or m = 3/(2 +
> 3a)*2^32, etc.
> 
> So perhaps I'm starting to be convinced that the default sample()
should
> be fixed.
I have the impression that Lemire's method gives the same results unless
it is correcting for the bias that exists in the current method. If that
is really the case, then the disruption should be rather minor. The
ability to fall back to the old behavior would still be useful, though.

cheerio
ralf

-- 
Ralf Stubner
Senior Software Engineer / Trainer

daqana GmbH
Dortustra?e 48
14467 Potsdam

T: +49 331 23 61 93 11
F: +49 331 23 61 93 90
M: +49 162 20 91 196
Mail: ralf.stubner at daqana.com

Sitz: Potsdam
Register: AG Potsdam HRB 27966 P
Ust.-IdNr.: DE300072622
Gesch?ftsf?hrer: Prof. Dr. Dr. Karl-Kuno Kunze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20180921/10d5df07/attachment.sig>

Duncan Murdoch

2018-Nov-01 16:46 UTC

head link

[Rd] Bias in R's random integers?

An email today reminded me of this issue.  The other bug (fractional 
population sizes) was fixed quite a while ago, but this one still 
exists.  I've posted a bug report about it (PR#17494).

Duncan Murdoch

On 20/09/2018 11:15 AM, Duncan Murdoch wrote:> On 20/09/2018 6:59 AM, Ralf Stubner wrote:
>> On 9/20/18 1:43 AM, Carl Boettiger wrote:
>>> For a well-tested C algorithm, based on my reading of Lemire, the
unbiased
>>> "algorithm 3" in https://arxiv.org/abs/1805.10941 is part
already of the C
>>> standard library in OpenBSD and macOS (as arc4random_uniform), and
in the
>>> GNU standard library.  Lemire also provides C++ code in the
appendix of his
>>> piece for both this and the faster "nearly divisionless"
algorithm.
>>>
>>> It would be excellent if any R core members were interested in
considering
>>> bindings to these algorithms as a patch, or might express
expectations for
>>> how that patch would have to operate (e.g. re Duncan's comment
about
>>> non-integer arguments to sample size).  Otherwise, an R package
binding
>>> seems like a good starting point, but I'm not the right
volunteer.
>> It is difficult to do this in a package, since R does not provide
access
>> to the random bits generated by the RNG. Only a float in (0,1) is
>> available via unif_rand().
> 
> I believe it is safe to multiply the unif_rand() value by 2^32, and take
> the whole number part as an unsigned 32 bit integer.  Depending on the
> RNG in use, that will give at least 25 random bits.  (The low order bits
> are the questionable ones.  25 is just a guess, not a guarantee.)
> 
> However, if one is willing to use an external
>> RNG, it is of course possible. After reading about Lemire's work
[1], I
>> had planned to integrate such an unbiased sampling scheme into the
dqrng
>> package, which I have now started. [2]
>>
>> Using Duncan's example, the results look much better:
>>
>>> library(dqrng)
>>> m <- (2/5)*2^32
>>> y <- dqsample(m, 1000000, replace = TRUE)
>>> table(y %% 2)
>>
>>        0      1
>> 500252 499748
> 
> Another useful diagnostic is
> 
>     plot(density(y[y %% 2 == 0]))
> 
> Obviously that should give a more or less uniform density, but for
> values near m, the default sample() gives some nice pretty pictures of
> quite non-uniform densities.
> 
> By the way, there are actually quite a few examples of very large m
> besides m = (2/5)*2^32 where performance of sample() is noticeably bad.
> You'll see problems in y %% 2 for any integer a > 1 with m = 2/(1 +
2a)
> * 2^32, problems in y %% 3 for m = 3/(1 + 3a)*2^32 or m = 3/(2 +
> 3a)*2^32, etc.
> 
> So perhaps I'm starting to be convinced that the default sample()
should
> be fixed.
> 
> Duncan Murdoch
> 
> 
>>
>> Currently I am taking the other interpretation of
"truncated":
>>
>>> table(dqsample(2.5, 1000000, replace = TRUE))
>>
>>        0      1
>> 499894 500106
>>
>> I will adjust this to whatever is decided for base R.
>>
>>
>> However, there is currently neither long vector nor weighted sampling
>> support. And the performance without replacement is quite bad compared
>> to R's algorithm with hashing.
>>
>> cheerio
>> ralf
>>
>> [1] via http://www.pcg-random.org/posts/bounded-rands.html
>> [2] https://github.com/daqana/dqrng/tree/feature/sample
>>
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

Possibly Parallel Threads

Search for more seemingly similar threads

R devel - Sep 2018 - Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

[Rd] Bias in R's random integers?

Possibly Parallel Threads