Karl Forner
2012-Mar-02 09:38 UTC
[Rd] c/c++ Random Number Generators Benchmarks using OpenMP
Dear R gurus,
I am interested in permutations-based cpu-intensive methods so I had to pay
a little attention to Random Number Generators (RNG).
For my needs, RNGs have to:
1) be fast. I profiled my algorithms, and for some the bottleneck was
the RNG.
2) be scalable. Meaning that I want the RNG to remain fast as I add
threads.
3) offer a long cycle length. Some basic generators have a cycle length
so low that in a few seconds you can finish it, making further computations
useless and redundant
4) be able to give reproducible results independent of the number of
threads used, i.e. I want my program to give the very same exact results
using one or 10 threads
( 4) "be good" of course )
I found an implementation that seems to meet my criterion and made a
preliminary package to test it.
In the meantime Petr Savicky contacted saying he was about to release a
similar package called rngOpenMP.
So I decided to perform some quick benchmarks. The benchmark code is
available as a R package "rngBenchmarks" here:
https://r-forge.r-project.org/scm/viewvc.php/pkg/?root=gwas-bin-tests
but it depends on some unpublished package, like rngOpenMP, and my
preliminary package, yet available from the same URL.
As a benchmark I implemented a Monte-Carlo computation of PI.
I tried to use the exact same computation method, using a template argument
for the RNG, and providing wrappers for the
different available RNGs, except for the rngOpenMP that is not
instantiable, so I adapted specifically the code.
I included in the benchmark:
- the c implementation used by the R package Rlecuyer
- the (GNU) random_r RNG available on GNU/linux systems and that is
reentrant
- my RcppRandomSFMT,wrapping a modified version of the SIMD-oriented Fast
Mersenne Twister (SFMT) Random Number Generator
provided by http://www.agner.org/random Randomc
- rngOpenMP
I tried to include the rsprng RNG, but could not manage to use it in my
code.
My conclusions:
- all the implementations work, meaning that the computed values converge
towards PI with the number of iterations
- all the implementations are scalable.
- RcppRandomSFMT and random_r are an order of magnitude faster than
rlecuyer and rngOpenMP
- actually RcppRandomSFMT and random_r have very similar performance.
The problem with random_r is that its cycle length according to my manpage
is ~ 3E10, enabling for instance only 3 millions permutations of a vector
of 10,000 elements,
to be compared with
Leaving the RcppRandomSFMT as best candidate. This implementation also
allows multiple seeds, solving my requisite number 4, reproducible results
independent of the number of threads, if I use as second seed the task
identifier.
Of course I am probably biased, so please tell me if you have some better
ideas of benchmarks, tests of correctness, if you'd like some other
implementations to be included.
People interested in this topic couldcontact me in order that we
collaboratively propose an implementation suiting all needs.
Thanks,
Karl Forner
Annex:
I ran the benchmarks on a linux Intel(R) Xeon(R) with 2 cpus of 4 cores
each ( CPU E5520 @ 2.27GHz).
type threads n error time time_per_chunk
1 lecuyer 1 1e+07 2.105472e-04 1.538 0.00153800
2 lecuyer 1 1e+08 4.441492e-05 15.265 0.00152650
3 lecuyer 1 1e+09 2.026819e-05 153.209 0.00153209
4 lecuyer 2 1e+07 3.182633e-04 0.821 0.00082100
5 lecuyer 2 1e+08 7.375036e-05 7.751 0.00077510
6 lecuyer 2 1e+09 9.290323e-06 76.476 0.00076476
7 lecuyer 4 1e+07 9.630351e-05 0.401 0.00040100
8 lecuyer 4 1e+08 1.263486e-05 3.887 0.00038870
9 lecuyer 4 1e+09 1.151515e-06 38.618 0.00038618
10 lecuyer 8 1e+07 1.239703e-05 0.241 0.00024100
11 lecuyer 8 1e+08 7.894518e-05 2.133 0.00021330
12 lecuyer 8 1e+09 6.782041e-06 20.420 0.00020420
13 random_r 1 1e+07 7.898746e-05 0.137 0.00013700
14 random_r 1 1e+08 4.748343e-05 1.290 0.00012900
15 random_r 1 1e+09 1.685692e-05 12.844 0.00012844
16 random_r 2 1e+07 4.757590e-06 0.095 0.00009500
17 random_r 2 1e+08 7.389450e-05 0.663 0.00006630
18 random_r 2 1e+09 2.913732e-05 6.469 0.00006469
19 random_r 4 1e+07 1.664590e-04 0.037 0.00003700
20 random_r 4 1e+08 1.138106e-04 0.330 0.00003300
21 random_r 4 1e+09 3.734717e-05 3.209 0.00003209
22 random_r 8 1e+07 1.034678e-04 0.051 0.00005100
23 random_r 8 1e+08 4.733472e-05 0.167 0.00001670
24 random_r 8 1e+09 1.985413e-05 1.694 0.00001694
25 rng_openmp 1 1e+07 2.097492e-04 1.231 0.00123100
26 rng_openmp 1 1e+08 7.580436e-05 12.155 0.00121550
27 rng_openmp 1 1e+09 2.772810e-05 120.712 0.00120712
28 rng_openmp 2 1e+07 6.870833e-05 0.609 0.00060900
29 rng_openmp 2 1e+08 1.071006e-04 6.095 0.00060950
30 rng_openmp 2 1e+09 4.296624e-05 61.426 0.00061426
31 rng_openmp 4 1e+07 3.000218e-04 0.312 0.00031200
32 rng_openmp 4 1e+08 6.313562e-05 3.066 0.00030660
33 rng_openmp 4 1e+09 1.436418e-05 30.441 0.00030441
34 rng_openmp 8 1e+07 2.767216e-04 0.191 0.00019100
35 rng_openmp 8 1e+08 9.341252e-06 1.614 0.00016140
36 rng_openmp 8 1e+09 6.874726e-06 16.121 0.00016121
37 sfmt 1 1e+07 2.313942e-04 0.114 0.00011400
38 sfmt 1 1e+08 1.485365e-06 1.186 0.00011860
39 sfmt 1 1e+09 1.083604e-05 11.417 0.00011417
40 sfmt 2 1e+07 1.898866e-04 0.062 0.00006200
41 sfmt 2 1e+08 2.627199e-06 0.573 0.00005730
42 sfmt 2 1e+09 1.295063e-05 5.779 0.00005779
43 sfmt 4 1e+07 1.691328e-04 0.033 0.00003300
44 sfmt 4 1e+08 4.942283e-05 0.315 0.00003150
45 sfmt 4 1e+09 1.232928e-05 2.871 0.00002871
46 sfmt 8 1e+07 1.001232e-04 0.022 0.00002200
47 sfmt 8 1e+08 3.991173e-05 0.149 0.00001490
48 sfmt 8 1e+09 5.775921e-06 1.550 0.00001550
type threads n error time time_div_threads
1 lecuyer 8 1e+07 6.628918e-05 0.269 0.033625
2 lecuyer 8 1e+08 8.265030e-05 2.033 0.254125
3 lecuyer 8 1e+09 6.422987e-06 20.684 2.585500
4 lecuyer 8 1e+10 1.785588e-06 203.427 25.428375
5 random_r 8 1e+07 2.349934e-04 0.017 0.002125
6 random_r 8 1e+08 6.640785e-05 0.172 0.021500
7 random_r 8 1e+09 2.074285e-05 1.701 0.212625
8 random_r 8 1e+10 2.084929e-05 17.145 2.143125
9 rng_openmp 8 1e+07 2.737931e-04 0.159 0.019875
10 rng_openmp 8 1e+08 7.173399e-07 1.624 0.203000
11 rng_openmp 8 1e+09 7.219774e-06 16.097 2.012125
12 rng_openmp 8 1e+10 4.310556e-06 161.093 20.136625
13 sfmt 8 1e+07 1.332275e-04 0.015 0.001875
14 sfmt 8 1e+08 4.277652e-05 0.149 0.018625
15 sfmt 8 1e+09 3.728551e-06 1.501 0.187625
16 sfmt 8 1e+10 6.365431e-06 15.041 1.880125
[[alternative HTML version deleted]]
