GILLIBERT, Andre
2021-Aug-07 10:40 UTC
[Rd] Redundant source code for random number generation
Dear R developers, When trying to fix poor performances of the runif() function (I can easily make it three times faster if you are interested in performance patches, maybe six times faster with a bit of refactoring of R source code), I noticed some redundant code in R source code (R-devel of 2021-08-05). Indeed, the family of random number generation functions (runif, rnorm, rchisq, rbeta, rbinom, etc.) is implemented via Internal functions described in src/main/names.c and implemented as do_random1, do_random2 and do_random3 in src/main/random.c. They are also reimplemented in src/library/stats/src/random.c in three main functions (random1, random2, random3) that will eventually be stored in a dynamic library (stats.so or stats.dll). For instance, the stats::runif R function is implemented as: function (n, min = 0, max = 1) .Call(C_runif, n, min, max) but could equivalently be implemented as: function(n, min = 0, max = 1) .Internal(runif(n, min, max)) The former calls the src/library/stats/src/random.c implementation (in stats.so or stats.dll) while the latter would call the src/main/random.c implementation (in the main R binary). The two implementations (src/main/random.c and src/library/stats/src/random.c) are similar but slightly different on small details. For instance, rbinom always return a vector of doubles (REAL) in src/main/random.c while it tries to return a vector of integers in src/library/stats/src/random.c, unless the integers are too large to fit in an INT. I see no obvious reason of maintaining both source codes. Actually the src/main/random.c seems to be unused in normal R programs. There could be some weird programs that use the .Internal call, but I do not think that there are many. There are several strategies to merge both, but I want some feedback of people who know well the R source code before proposing patches. -- Sincerely Andr? GILLIBERT [[alternative HTML version deleted]]