Henrik Bengtsson
2019-Apr-11 20:06 UTC
[Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
ISSUE: Using *forks* for parallel processing in R is not always safe. The `parallel::mclapply()` function uses forked processes to parallelize. One example where it has been confirmed that forked processing causes problems is when running R via RStudio. It is recommended to use PSOCK clusters (`parallel::makeCluster()`) rather than *forked* processes when running R from RStudio ( https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011). AFAIK, it is not straightforward to disable forked processing in R. One could set environment variable `MC_CORES=1` which will set R option `mc.cores=1` when the parallel package is loaded. Since `mc.cores = getOption("mc.cores", 2L)` is the default for `parallel::mclapply()`, this will cause `mclapply()` to fall back to `lapply()` avoiding _forked_ processing. However, this does not work when the code specifies argument `mc.cores`, e.g. `mclapply(..., mc.cores = detectCores())`. SUGGESTION: Introduce environment variable `R_ENABLE_FORKS` and corresponding R option `enable.forks` that both take logical scalars. By setting `R_ENABLE_FORKS=false` or equivalently `enable.forks=FALSE`, `parallel::mclapply()` will fall back to `lapply()`. For `parallel::mcparallel()`, we could produce an error if forks are disabled. Comments? /Henrik
IƱaki Ucar
2019-Apr-12 09:32 UTC
[Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
On Thu, 11 Apr 2019 at 22:07, Henrik Bengtsson <henrik.bengtsson at gmail.com> wrote:> > ISSUE: > Using *forks* for parallel processing in R is not always safe. > [...] > Comments?Using fork() is never safe. The reference provided by Kevin [1] is pretty compelling (I kindly encourage anyone who ever forked a process to read it). Therefore, I'd go beyond Henrik's suggestion, and I'd advocate for deprecating fork clusters and eventually removing them from parallel. [1] https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf -- I?aki ?car
Travers Ching
2019-Apr-12 19:31 UTC
[Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
Just throwing my two cents in: I think removing/deprecating fork would be a bad idea for two reasons: 1) There are no performant alternatives 2) Removing fork would break existing workflows Even if replaced with something using the same interface (e.g., a function that automatically detects variables to export as in the amazing `future` package), the lack of copy-on-write functionality would cause scripts everywhere to break. A simple example illustrating these two points: `x <- 5e8; mclapply(1:24, sum, x, 8)` Using fork, `mclapply` takes 5 seconds. Using "psock", `clusterApply` does not complete. Travers On Fri, Apr 12, 2019 at 2:32 AM I?aki Ucar <iucar at fedoraproject.org> wrote:> > On Thu, 11 Apr 2019 at 22:07, Henrik Bengtsson > <henrik.bengtsson at gmail.com> wrote: > > > > ISSUE: > > Using *forks* for parallel processing in R is not always safe. > > [...] > > Comments? > > Using fork() is never safe. The reference provided by Kevin [1] is > pretty compelling (I kindly encourage anyone who ever forked a process > to read it). Therefore, I'd go beyond Henrik's suggestion, and I'd > advocate for deprecating fork clusters and eventually removing them > from parallel. > > [1] https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf > > -- > I?aki ?car > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel