IƱaki Ucar
2019-Apr-15 09:02 UTC
[Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> > On 4/13/19 12:05 PM, I?aki Ucar wrote: > > On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinushey at gmail.com> wrote: > >> I think it's worth saying that mclapply() works as documented > > Mostly, yes. But it says nothing about fork's copy-on-write and memory > > overcommitment, and that this means that it may work nicely or fail > > spectacularly depending on whether, e.g., you operate on a long > > vector. > > R cannot possibly replicate documentation of the underlying operating > systems. It clearly says that fork() is used and readers who may not > know what fork() is need to learn it from external sources. > Copy-on-write is an elementary property of fork().Just to be precise, copy-on-write is an optimization widely deployed in most modern *nixes, particularly for the architectures in which R usually runs. But it is not an elementary property; it is not even possible without an MMU. -- I?aki ?car
Tomas Kalibera
2019-Apr-15 10:12 UTC
[Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
On 4/15/19 11:02 AM, I?aki Ucar wrote:> On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera <tomas.kalibera at gmail.com> wrote: >> On 4/13/19 12:05 PM, I?aki Ucar wrote: >>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinushey at gmail.com> wrote: >>>> I think it's worth saying that mclapply() works as documented >>> Mostly, yes. But it says nothing about fork's copy-on-write and memory >>> overcommitment, and that this means that it may work nicely or fail >>> spectacularly depending on whether, e.g., you operate on a long >>> vector. >> R cannot possibly replicate documentation of the underlying operating >> systems. It clearly says that fork() is used and readers who may not >> know what fork() is need to learn it from external sources. >> Copy-on-write is an elementary property of fork(). > Just to be precise, copy-on-write is an optimization widely deployed > in most modern *nixes, particularly for the architectures in which R > usually runs. But it is not an elementary property; it is not even > possible without an MMU.Yes, old Unix systems without virtual memory had fork eagerly copying. Not relevant today, and certainly not for systems that run R, but indeed people interested in OS internals can look elsewhere for more precise information. Tomas
Henrik Bengtsson
2020-Jan-10 06:33 UTC
[Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
I'd like to pick up this thread started on 2019-04-11 (https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html). Modulo all the other suggestions in this thread, would my proposal of being able to disable forked processing via an option or an environment variable make sense? I've prototyped a working patch that works like:> options(fork.allowed = FALSE) > unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))[1] 14058 14058> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())[1] 14058 14058> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)[1] 14058.1 14058.2> f <- parallel::mcparallel(Sys.getpid())Error in allowFork(assert = TRUE) : Forked processing is not allowed per option ?fork.allowed? or environment variable ?R_FORK_ALLOWED?> cl <- parallel::makeForkCluster(1L)Error in allowFork(assert = TRUE) : Forked processing is not allowed per option ?fork.allowed? or environment variable ?R_FORK_ALLOWED?>The patch is: Index: src/library/parallel/R/unix/forkCluster.R ==================================================================--- src/library/parallel/R/unix/forkCluster.R (revision 77648) +++ src/library/parallel/R/unix/forkCluster.R (working copy) @@ -30,6 +30,7 @@ newForkNode <- function(..., options = defaultClusterOptions, rank) { + allowFork(assert = TRUE) options <- addClusterOptions(options, list(...)) outfile <- getClusterOption("outfile", options) port <- getClusterOption("port", options) Index: src/library/parallel/R/unix/mclapply.R ==================================================================--- src/library/parallel/R/unix/mclapply.R (revision 77648) +++ src/library/parallel/R/unix/mclapply.R (working copy) @@ -28,7 +28,7 @@ stop("'mc.cores' must be >= 1") .check_ncores(cores) - if (isChild() && !isTRUE(mc.allow.recursive)) + if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive))) return(lapply(X = X, FUN = FUN, ...)) ## Follow lapply Index: src/library/parallel/R/unix/mcparallel.R ==================================================================--- src/library/parallel/R/unix/mcparallel.R (revision 77648) +++ src/library/parallel/R/unix/mcparallel.R (working copy) @@ -20,6 +20,7 @@ mcparallel <- function(expr, name, mc.set.seed = TRUE, silent FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE) { + allowFork(assert = TRUE) f <- mcfork(detached) env <- parent.frame() if (isTRUE(mc.set.seed)) mc.advance.stream() Index: src/library/parallel/R/unix/pvec.R ==================================================================--- src/library/parallel/R/unix/pvec.R (revision 77648) +++ src/library/parallel/R/unix/pvec.R (working copy) @@ -25,7 +25,7 @@ cores <- as.integer(mc.cores) if(cores < 1L) stop("'mc.cores' must be >= 1") - if(cores == 1L) return(FUN(v, ...)) + if(cores == 1L || !allowFork()) return(FUN(v, ...)) .check_ncores(cores) if(mc.set.seed) mc.reset.stream() with a new file src/library/parallel/R/unix/allowFork.R: allowFork <- function(assert = FALSE) { value <- Sys.getenv("R_FORK_ALLOWED") if (nzchar(value)) { value <- switch(value, "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE, "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE, stop(gettextf("invalid environment variable value: %s==%s", "R_FORK_ALLOWED", value))) value <- as.logical(value) } else { value <- TRUE } value <- getOption("fork.allowed", value) if (is.na(value)) { stop(gettextf("invalid option value: %s==%s", "fork.allowed", value)) } if (assert && !value) { stop(gettextf("Forked processing is not allowed per option %s or environment variable %s", sQuote("fork.allowed"), sQuote("R_FORK_ALLOWED"))) } value } /Henrik On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> > On 4/15/19 11:02 AM, I?aki Ucar wrote: > > On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera <tomas.kalibera at gmail.com> wrote: > >> On 4/13/19 12:05 PM, I?aki Ucar wrote: > >>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey <kevinushey at gmail.com> wrote: > >>>> I think it's worth saying that mclapply() works as documented > >>> Mostly, yes. But it says nothing about fork's copy-on-write and memory > >>> overcommitment, and that this means that it may work nicely or fail > >>> spectacularly depending on whether, e.g., you operate on a long > >>> vector. > >> R cannot possibly replicate documentation of the underlying operating > >> systems. It clearly says that fork() is used and readers who may not > >> know what fork() is need to learn it from external sources. > >> Copy-on-write is an elementary property of fork(). > > Just to be precise, copy-on-write is an optimization widely deployed > > in most modern *nixes, particularly for the architectures in which R > > usually runs. But it is not an elementary property; it is not even > > possible without an MMU. > > Yes, old Unix systems without virtual memory had fork eagerly copying. > Not relevant today, and certainly not for systems that run R, but indeed > people interested in OS internals can look elsewhere for more precise > information. > > Tomas >
Reasonably Related Threads
- SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
- SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
- SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
- SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()
- SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()