Dirk Eddelbuettel
2023-Aug-08 00:07 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
On 8 August 2023 at 11:21, Simon Urbanek wrote:
| First, detecting HT vs cores is not necessarily possible in general, Linux may
assign core id to each HT depending on circumstances:
|
| $ grep 'cpu cores' /proc/cpuinfo | uniq
| cpu cores : 32
| $ grep 'model name' /proc/cpuinfo | uniq
| model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
|
| and you can look up that Xenon 6142 has 16 cores.
|
| Second, instead of "awk"ward contortions it's easily done in R
with something like
|
| d=read.dcf("/proc/cpuinfo")
| sum(as.integer(tapply(
| d[,grep("cpu cores",colnames(d))],
| d[,grep("physical id",colnames(d))], `[`, 1)))
|
| which avoids subprocesses, quoting hell and all such issues...
Love the use of read.dcf("/proc/cpuinfo") !!
On my box a simpler
> d <- read.dcf("/proc/cpuinfo")
> as.integer(unique(d[, grep("cpu cores",colnames(d))]))
[1] 6
>
does the right thing.
Dirk
--
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Simon Urbanek
2023-Aug-08 01:17 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
> On 8/08/2023, at 12:07 PM, Dirk Eddelbuettel <edd at debian.org> wrote: > > > On 8 August 2023 at 11:21, Simon Urbanek wrote: > | First, detecting HT vs cores is not necessarily possible in general, Linux may assign core id to each HT depending on circumstances: > | > | $ grep 'cpu cores' /proc/cpuinfo | uniq > | cpu cores : 32 > | $ grep 'model name' /proc/cpuinfo | uniq > | model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz > | > | and you can look up that Xenon 6142 has 16 cores. > | > | Second, instead of "awk"ward contortions it's easily done in R with something like > | > | d=read.dcf("/proc/cpuinfo") > | sum(as.integer(tapply( > | d[,grep("cpu cores",colnames(d))], > | d[,grep("physical id",colnames(d))], `[`, 1))) > | > | which avoids subprocesses, quoting hell and all such issues... > > Love the use of read.dcf("/proc/cpuinfo") !! > > On my box a simpler > >> d <- read.dcf("/proc/cpuinfo") >> as.integer(unique(d[, grep("cpu cores",colnames(d))])) > [1] 6 >> >I don't think that works on NUMA/SMP machines - you need to add all the cores for each CPU (that's why the above splits by physical id which is unique per cpu). On a dual-cpu machine:> as.integer(unique(d[, grep("cpu cores",colnames(d))]))[1] 32> sum(as.integer(tapply(d[,grep("cpu cores",colnames(d))], d[,grep("physical id",colnames(d))], `[`, 1))) [1] 64 Also things get quite fun on VMs as they can cobble together quite a few virtual CPUs regardless of the underlying hardware. To be honest I think the motivation of this thread is dubious at best: it is a bad idea to use detectCore() blindly to specify parallelization and we explicitly say it's a bad idea - any sensible person will set it according to the demands, the hardware and the task. The number of cores is only partially relevant - e.g. if any I/O is involved you want to oversubscribe the CPU. If you have other users you want to only use a fraction etc. That doesn't mean that the we couldn't do a better job, but if you have to use detectCores() then you are already in trouble to start with. Cheers, Simon