Henrik Bengtsson
2015-Jun-20 21:21 UTC
[Rd] Listing all spawned jobs/processed after parallel::mcparallel()?
QUESTION: Is it possible to query number of active jobs running after launching them with parallel::mcparallel()? For example, if I launch 3 jobs using:> library(parallel) > f <- lapply(1:3, FUN=mcparallel)then I can inspect them as:> str(f)List of 3 $ :List of 2 ..$ pid: int 142225 ..$ fd : int [1:2] 8 13 ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process" $ :List of 2 ..$ pid: int 142226 ..$ fd : int [1:2] 10 15 ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process" $ :List of 2 ..$ pid: int 142227 ..$ fd : int [1:2] 12 17 ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process" However, if I launch them without "recording" them, or equivalently if I do:> f <- lapply(1:3, FUN=mcparallel) > rm(list="f")is there a function/mechanism in R/the parallel package allowing me to find the currently active/running processes? ... or at least query how many they are? I'd like to use this to prevent spawning of more than a maximum number of parallel processes. (Yes, I'm away of mclapply() and friends, but I'm looking at using more low-level mcparallel()/mccollect()). I'm trying to decide whether I should implement my own mechanism for keeping track of "jobs" or not. Thanks, Henrik
Prof Brian Ripley
2015-Jun-21 16:59 UTC
[Rd] Listing all spawned jobs/processed after parallel::mcparallel()?
On 20/06/2015 22:21, Henrik Bengtsson wrote:> QUESTION: > Is it possible to query number of active jobs running after launching > them with parallel::mcparallel()? > > For example, if I launch 3 jobs using: > >> library(parallel) >> f <- lapply(1:3, FUN=mcparallel) > > then I can inspect them as: > >> str(f) > List of 3 > $ :List of 2 > ..$ pid: int 142225 > ..$ fd : int [1:2] 8 13 > ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process" > $ :List of 2 > ..$ pid: int 142226 > ..$ fd : int [1:2] 10 15 > ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process" > $ :List of 2 > ..$ pid: int 142227 > ..$ fd : int [1:2] 12 17 > ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process" > > However, if I launch them without "recording" them, or equivalently if I do: > >> f <- lapply(1:3, FUN=mcparallel) >> rm(list="f") > > is there a function/mechanism in R/the parallel package allowing me to > find the currently active/running processes? ... or at least query > how many they are? I'd like to use this to prevent spawning of more > than a maximum number of parallel processes. (Yes, I'm away of > mclapply() and friends, but I'm looking at using more low-level > mcparallel()/mccollect()). I'm trying to decide whether I should > implement my own mechanism for keeping track of "jobs" or not.Note that 'currently active/running' is a slippery concept and is not what the results above show. But see ?children, which seems to be what you are looking for. It is not exported and there is no more detailed explanation save the source code. Also note that tells you about children and not grandchildren .... You can find out about child processes (and their children) at OS level, for example via the 'ps' command, but doing so portably is not easy. -- Brian D. Ripley, ripley at stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK
Henrik Bengtsson
2015-Jun-24 03:47 UTC
[Rd] Listing all spawned jobs/processed after parallel::mcparallel()?
On Sun, Jun 21, 2015 at 9:59 AM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:> On 20/06/2015 22:21, Henrik Bengtsson wrote: >> >> QUESTION: >> Is it possible to query number of active jobs running after launching >> them with parallel::mcparallel()? >> >> For example, if I launch 3 jobs using: >> >>> library(parallel) >>> f <- lapply(1:3, FUN=mcparallel) >> >> >> then I can inspect them as: >> >>> str(f) >> >> List of 3 >> $ :List of 2 >> ..$ pid: int 142225 >> ..$ fd : int [1:2] 8 13 >> ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process" >> $ :List of 2 >> ..$ pid: int 142226 >> ..$ fd : int [1:2] 10 15 >> ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process" >> $ :List of 2 >> ..$ pid: int 142227 >> ..$ fd : int [1:2] 12 17 >> ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process" >> >> However, if I launch them without "recording" them, or equivalently if I >> do: >> >>> f <- lapply(1:3, FUN=mcparallel) >>> rm(list="f") >> >> >> is there a function/mechanism in R/the parallel package allowing me to >> find the currently active/running processes? ... or at least query >> how many they are? I'd like to use this to prevent spawning of more >> than a maximum number of parallel processes. (Yes, I'm away of >> mclapply() and friends, but I'm looking at using more low-level >> mcparallel()/mccollect()). I'm trying to decide whether I should >> implement my own mechanism for keeping track of "jobs" or not. > > > Note that 'currently active/running' is a slippery concept and is not what > the results above show. But see ?children, which seems to be what you are > looking for. It is not exported and there is no more detailed explanation > save the source code. Also note that tells you about children and not > grandchildren .... > > You can find out about child processes (and their children) at OS level, for > example via the 'ps' command, but doing so portably is not easy.Thank you very much. This was exactly what I was looking for. I appreciate the problem of identifying grandchildren, but with children() I know at least have chance to get to a lower bound of the number of "active children" (?children). After some initial testing on Linux and OSX, I'm glad to see that parallel:::children() seem to reflect what are actually active processes, e.g. if I SIGTERM one of them externally, it is immediately dropped from parallel:::children(). I also noticed that the process remains active until it has been parallel:::mccollect():ed. /Henrik> > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Emeritus Professor of Applied Statistics, University of Oxford > 1 South Parks Road, Oxford OX1 3TG, UK
Apparently Analagous Threads
- Listing all spawned jobs/processed after parallel::mcparallel()?
- DOCUMENTATION(?): parallel::mcparallel() gives various types of "Error in unserialize(r) : ..." errors if value is of type raw
- Strange error messages from parallel::mcparallel family under 3.6.0
- mcparallel / mccollect
- Detecting whether a process exists or not by its PID?