Henrik Bengtsson
2012-Dec-04 20:24 UTC
[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)
In the 'parallel' package there is detectCores(), which tries its best to infer the number of cores on the current machine. This is useful if you wish to utilize the *maximum* number of cores on the machine. Several are using this to set the number of cores when parallelizing, sometimes also hardcoded within 3rd-party scripts/package code, but there are several settings where you wish to use fewer, e.g. in a compute cluster where you R session is given only a portion of the cores available. Because of this, I'd like to propose to add getCores(), which by default returns what detectCores() gives, but can also be set to return what is assigned via setCores(). The idea is this getCores() could replace most common usage of detectCores() and provide more control. An additional feature would be that 'parallel' when loaded would check for command line argument --max-cores=<int>, which will update the number of cores via setCores(). This would make it possible for, say, a Torque/PBS compute cluster to launch an R batch script as Rscript --max-cores=$PBS_NP script.R and the only thing the script.R needs to know about is parallel::getCores(). I understand that I can do all this already in my own scripts, but I'd like to propose a standard for R. Comments? /Henrik
Simon Urbanek
2012-Dec-05 01:25 UTC
[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)
A somewhat simplistic answer is that we already have that with the "mc.cores" option. In multicore the default was to use all cores (without the need to use detectCores) and yet you could reduce the number as you want with mc.cores. This is similar to what you are talking about but it's not a sufficient solution. There are some plans for somewhat more general approach. You may have noticed that mcaffinity() was added to query/control/limit the mapping of cores to tasks. It allows much more file-grained control and better decisions whether to recursively split jobs or not as the state is global for the entire R. The (vague) plan is to generalize this for all platforms - if not binding to a particular core then at least to monitor the assigned number of cores. Cheers, Simon On Dec 4, 2012, at 3:24 PM, Henrik Bengtsson wrote:> In the 'parallel' package there is detectCores(), which tries its best > to infer the number of cores on the current machine. This is useful > if you wish to utilize the *maximum* number of cores on the machine. > Several are using this to set the number of cores when parallelizing, > sometimes also hardcoded within 3rd-party scripts/package code, but > there are several settings where you wish to use fewer, e.g. in a > compute cluster where you R session is given only a portion of the > cores available. Because of this, I'd like to propose to add > getCores(), which by default returns what detectCores() gives, but can > also be set to return what is assigned via setCores(). The idea is > this getCores() could replace most common usage of detectCores() and > provide more control. An additional feature would be that 'parallel' > when loaded would check for command line argument --max-cores=<int>, > which will update the number of cores via setCores(). This would make > it possible for, say, a Torque/PBS compute cluster to launch an R > batch script as > > Rscript --max-cores=$PBS_NP script.R > > and the only thing the script.R needs to know about is parallel::getCores(). > > I understand that I can do all this already in my own scripts, but I'd > like to propose a standard for R. > > Comments? > > /Henrik > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >
Norm Matloff
2012-Dec-16 00:38 UTC
[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)
Henrik Bengtsson <hb at biostat.ucsf.edu> wrote: ^ In the 'parallel' package there is detectCores(), which tries its best ^ to infer the number of cores on the current machine. This is useful ^ if you wish to utilize the *maximum* number of cores on the machine. ^ Several are using this to set the number of cores when parallelizing, ^ sometimes also hardcoded within 3rd-party scripts/package code, but ^ there are several settings where you wish to use fewer, e.g. in a ^ compute cluster where you R session is given only a portion of the ^ cores available. Because of this, I'd like to propose to add ^ getCores(), which by default returns what detectCores() gives, but can Even if one has the entire machine to oneself, there is often another very good reason not to use the maximum number of cores: Using the maximum number of cores may reduce performance. This is true in general, and sometimes especially true when the inferred number of cores includes hyperthreading. Norm
Reasonably Related Threads
- parallel::detectCores(TRUE) gives: Error in system(cmd, TRUE) : error in running command
- Get Logical processor count correctly whether NUMA is enabled or disabled
- Get Logical processor count correctly whether NUMA is enabled or disabled
- Get Logical processor count correctly whether NUMA is enabled or disabled
- mclapply enters into an infinite loop....