Karl Forner
2012-Mar-02 09:38 UTC
[Rd] c/c++ Random Number Generators Benchmarks using OpenMP
Dear R gurus, I am interested in permutations-based cpu-intensive methods so I had to pay a little attention to Random Number Generators (RNG). For my needs, RNGs have to: 1) be fast. I profiled my algorithms, and for some the bottleneck was the RNG. 2) be scalable. Meaning that I want the RNG to remain fast as I add threads. 3) offer a long cycle length. Some basic generators have a cycle length so low that in a few seconds you can finish it, making further computations useless and redundant 4) be able to give reproducible results independent of the number of threads used, i.e. I want my program to give the very same exact results using one or 10 threads ( 4) "be good" of course ) I found an implementation that seems to meet my criterion and made a preliminary package to test it. In the meantime Petr Savicky contacted saying he was about to release a similar package called rngOpenMP. So I decided to perform some quick benchmarks. The benchmark code is available as a R package "rngBenchmarks" here: r-forge.r-project.org/scm/viewvc.php/pkg/?root=gwas-bin-tests but it depends on some unpublished package, like rngOpenMP, and my preliminary package, yet available from the same URL. As a benchmark I implemented a Monte-Carlo computation of PI. I tried to use the exact same computation method, using a template argument for the RNG, and providing wrappers for the different available RNGs, except for the rngOpenMP that is not instantiable, so I adapted specifically the code. I included in the benchmark: - the c implementation used by the R package Rlecuyer - the (GNU) random_r RNG available on GNU/linux systems and that is reentrant - my RcppRandomSFMT,wrapping a modified version of the SIMD-oriented Fast Mersenne Twister (SFMT) Random Number Generator provided by agner.org/random Randomc - rngOpenMP I tried to include the rsprng RNG, but could not manage to use it in my code. My conclusions: - all the implementations work, meaning that the computed values converge towards PI with the number of iterations - all the implementations are scalable. - RcppRandomSFMT and random_r are an order of magnitude faster than rlecuyer and rngOpenMP - actually RcppRandomSFMT and random_r have very similar performance. The problem with random_r is that its cycle length according to my manpage is ~ 3E10, enabling for instance only 3 millions permutations of a vector of 10,000 elements, to be compared with Leaving the RcppRandomSFMT as best candidate. This implementation also allows multiple seeds, solving my requisite number 4, reproducible results independent of the number of threads, if I use as second seed the task identifier. Of course I am probably biased, so please tell me if you have some better ideas of benchmarks, tests of correctness, if you'd like some other implementations to be included. People interested in this topic couldcontact me in order that we collaboratively propose an implementation suiting all needs. Thanks, Karl Forner Annex: I ran the benchmarks on a linux Intel(R) Xeon(R) with 2 cpus of 4 cores each ( CPU E5520 @ 2.27GHz). type threads n error time time_per_chunk 1 lecuyer 1 1e+07 2.105472e-04 1.538 0.00153800 2 lecuyer 1 1e+08 4.441492e-05 15.265 0.00152650 3 lecuyer 1 1e+09 2.026819e-05 153.209 0.00153209 4 lecuyer 2 1e+07 3.182633e-04 0.821 0.00082100 5 lecuyer 2 1e+08 7.375036e-05 7.751 0.00077510 6 lecuyer 2 1e+09 9.290323e-06 76.476 0.00076476 7 lecuyer 4 1e+07 9.630351e-05 0.401 0.00040100 8 lecuyer 4 1e+08 1.263486e-05 3.887 0.00038870 9 lecuyer 4 1e+09 1.151515e-06 38.618 0.00038618 10 lecuyer 8 1e+07 1.239703e-05 0.241 0.00024100 11 lecuyer 8 1e+08 7.894518e-05 2.133 0.00021330 12 lecuyer 8 1e+09 6.782041e-06 20.420 0.00020420 13 random_r 1 1e+07 7.898746e-05 0.137 0.00013700 14 random_r 1 1e+08 4.748343e-05 1.290 0.00012900 15 random_r 1 1e+09 1.685692e-05 12.844 0.00012844 16 random_r 2 1e+07 4.757590e-06 0.095 0.00009500 17 random_r 2 1e+08 7.389450e-05 0.663 0.00006630 18 random_r 2 1e+09 2.913732e-05 6.469 0.00006469 19 random_r 4 1e+07 1.664590e-04 0.037 0.00003700 20 random_r 4 1e+08 1.138106e-04 0.330 0.00003300 21 random_r 4 1e+09 3.734717e-05 3.209 0.00003209 22 random_r 8 1e+07 1.034678e-04 0.051 0.00005100 23 random_r 8 1e+08 4.733472e-05 0.167 0.00001670 24 random_r 8 1e+09 1.985413e-05 1.694 0.00001694 25 rng_openmp 1 1e+07 2.097492e-04 1.231 0.00123100 26 rng_openmp 1 1e+08 7.580436e-05 12.155 0.00121550 27 rng_openmp 1 1e+09 2.772810e-05 120.712 0.00120712 28 rng_openmp 2 1e+07 6.870833e-05 0.609 0.00060900 29 rng_openmp 2 1e+08 1.071006e-04 6.095 0.00060950 30 rng_openmp 2 1e+09 4.296624e-05 61.426 0.00061426 31 rng_openmp 4 1e+07 3.000218e-04 0.312 0.00031200 32 rng_openmp 4 1e+08 6.313562e-05 3.066 0.00030660 33 rng_openmp 4 1e+09 1.436418e-05 30.441 0.00030441 34 rng_openmp 8 1e+07 2.767216e-04 0.191 0.00019100 35 rng_openmp 8 1e+08 9.341252e-06 1.614 0.00016140 36 rng_openmp 8 1e+09 6.874726e-06 16.121 0.00016121 37 sfmt 1 1e+07 2.313942e-04 0.114 0.00011400 38 sfmt 1 1e+08 1.485365e-06 1.186 0.00011860 39 sfmt 1 1e+09 1.083604e-05 11.417 0.00011417 40 sfmt 2 1e+07 1.898866e-04 0.062 0.00006200 41 sfmt 2 1e+08 2.627199e-06 0.573 0.00005730 42 sfmt 2 1e+09 1.295063e-05 5.779 0.00005779 43 sfmt 4 1e+07 1.691328e-04 0.033 0.00003300 44 sfmt 4 1e+08 4.942283e-05 0.315 0.00003150 45 sfmt 4 1e+09 1.232928e-05 2.871 0.00002871 46 sfmt 8 1e+07 1.001232e-04 0.022 0.00002200 47 sfmt 8 1e+08 3.991173e-05 0.149 0.00001490 48 sfmt 8 1e+09 5.775921e-06 1.550 0.00001550 type threads n error time time_div_threads 1 lecuyer 8 1e+07 6.628918e-05 0.269 0.033625 2 lecuyer 8 1e+08 8.265030e-05 2.033 0.254125 3 lecuyer 8 1e+09 6.422987e-06 20.684 2.585500 4 lecuyer 8 1e+10 1.785588e-06 203.427 25.428375 5 random_r 8 1e+07 2.349934e-04 0.017 0.002125 6 random_r 8 1e+08 6.640785e-05 0.172 0.021500 7 random_r 8 1e+09 2.074285e-05 1.701 0.212625 8 random_r 8 1e+10 2.084929e-05 17.145 2.143125 9 rng_openmp 8 1e+07 2.737931e-04 0.159 0.019875 10 rng_openmp 8 1e+08 7.173399e-07 1.624 0.203000 11 rng_openmp 8 1e+09 7.219774e-06 16.097 2.012125 12 rng_openmp 8 1e+10 4.310556e-06 161.093 20.136625 13 sfmt 8 1e+07 1.332275e-04 0.015 0.001875 14 sfmt 8 1e+08 4.277652e-05 0.149 0.018625 15 sfmt 8 1e+09 3.728551e-06 1.501 0.187625 16 sfmt 8 1e+10 6.365431e-06 15.041 1.880125 [[alternative HTML version deleted]]