thr3ads.net - R devel - [Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores) [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Henrik Bengtsson

2012-Dec-04 20:24 UTC

[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

In the 'parallel' package there is detectCores(), which tries its best
to infer the number of cores on the current machine.  This is useful
if you wish to utilize the *maximum* number of cores on the machine.
Several are using this to set the number of cores when parallelizing,
sometimes also hardcoded within 3rd-party scripts/package code, but
there are several settings where you wish to use fewer, e.g. in a
compute cluster where you R session is given only a portion of the
cores available.  Because of this, I'd like to propose to add
getCores(), which by default returns what detectCores() gives, but can
also be set to return what is assigned via setCores().  The idea is
this getCores() could replace most common usage of detectCores() and
provide more control.  An additional feature would be that 'parallel'
when loaded would check for command line argument --max-cores=<int>,
which will update the number of cores via setCores().  This would make
it possible for, say, a Torque/PBS compute cluster to launch an R
batch script as

  Rscript --max-cores=$PBS_NP script.R

and the only thing the script.R needs to know about is parallel::getCores().

I understand that I can do all this already in my own scripts, but I'd
like to propose a standard for R.

Comments?

/Henrik

Simon Urbanek

2012-Dec-05 01:25 UTC

head link

[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

A somewhat simplistic answer is that we already have that with the
"mc.cores" option. In multicore the default was to use all cores
(without the need to use detectCores) and yet you could reduce the number as you
want with mc.cores. This is similar to what you are talking about but it's
not a sufficient solution.

There are some plans for somewhat more general approach. You may have noticed
that mcaffinity() was added to query/control/limit the mapping of cores to
tasks. It allows much more file-grained control and better decisions whether to
recursively split jobs or not as the state is global for the entire R. The
(vague) plan is to generalize this for all platforms - if not binding to a
particular core then at least to monitor the assigned number of cores.

Cheers,
Simon

On Dec 4, 2012, at 3:24 PM, Henrik Bengtsson wrote:
> In the 'parallel' package there is detectCores(), which tries its
best
> to infer the number of cores on the current machine.  This is useful
> if you wish to utilize the *maximum* number of cores on the machine.
> Several are using this to set the number of cores when parallelizing,
> sometimes also hardcoded within 3rd-party scripts/package code, but
> there are several settings where you wish to use fewer, e.g. in a
> compute cluster where you R session is given only a portion of the
> cores available.  Because of this, I'd like to propose to add
> getCores(), which by default returns what detectCores() gives, but can
> also be set to return what is assigned via setCores().  The idea is
> this getCores() could replace most common usage of detectCores() and
> provide more control.  An additional feature would be that
'parallel'
> when loaded would check for command line argument --max-cores=<int>,
> which will update the number of cores via setCores().  This would make
> it possible for, say, a Torque/PBS compute cluster to launch an R
> batch script as
> 
>  Rscript --max-cores=$PBS_NP script.R
> 
> and the only thing the script.R needs to know about is
parallel::getCores().
> 
> I understand that I can do all this already in my own scripts, but I'd
> like to propose a standard for R.
> 
> Comments?
> 
> /Henrik
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>

Norm Matloff

2012-Dec-16 00:38 UTC

head link

[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:

^ In the 'parallel' package there is detectCores(), which tries its best
^ to infer the number of cores on the current machine.  This is useful
^ if you wish to utilize the *maximum* number of cores on the machine.
^ Several are using this to set the number of cores when parallelizing,
^ sometimes also hardcoded within 3rd-party scripts/package code, but
^ there are several settings where you wish to use fewer, e.g. in a
^ compute cluster where you R session is given only a portion of the
^ cores available.  Because of this, I'd like to propose to add
^ getCores(), which by default returns what detectCores() gives, but can

Even if one has the entire machine to oneself, there is often another
very good reason not to use the maximum number of cores:  Using the
maximum number of cores may reduce performance.  This is true in
general, and sometimes especially true when the inferred number of cores
includes hyperthreading.

Norm

Seemingly Similar Threads

Search for more maybe matching threads

R devel - Dec 2012 - SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

Seemingly Similar Threads