Dirk Eddelbuettel
2023-Aug-08 00:07 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
On 8 August 2023 at 11:21, Simon Urbanek wrote: | First, detecting HT vs cores is not necessarily possible in general, Linux may assign core id to each HT depending on circumstances: | | $ grep 'cpu cores' /proc/cpuinfo | uniq | cpu cores : 32 | $ grep 'model name' /proc/cpuinfo | uniq | model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz | | and you can look up that Xenon 6142 has 16 cores. | | Second, instead of "awk"ward contortions it's easily done in R with something like | | d=read.dcf("/proc/cpuinfo") | sum(as.integer(tapply( | d[,grep("cpu cores",colnames(d))], | d[,grep("physical id",colnames(d))], `[`, 1))) | | which avoids subprocesses, quoting hell and all such issues... Love the use of read.dcf("/proc/cpuinfo") !! On my box a simpler > d <- read.dcf("/proc/cpuinfo") > as.integer(unique(d[, grep("cpu cores",colnames(d))])) [1] 6 > does the right thing. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Simon Urbanek
2023-Aug-08 01:17 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
> On 8/08/2023, at 12:07 PM, Dirk Eddelbuettel <edd at debian.org> wrote: > > > On 8 August 2023 at 11:21, Simon Urbanek wrote: > | First, detecting HT vs cores is not necessarily possible in general, Linux may assign core id to each HT depending on circumstances: > | > | $ grep 'cpu cores' /proc/cpuinfo | uniq > | cpu cores : 32 > | $ grep 'model name' /proc/cpuinfo | uniq > | model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz > | > | and you can look up that Xenon 6142 has 16 cores. > | > | Second, instead of "awk"ward contortions it's easily done in R with something like > | > | d=read.dcf("/proc/cpuinfo") > | sum(as.integer(tapply( > | d[,grep("cpu cores",colnames(d))], > | d[,grep("physical id",colnames(d))], `[`, 1))) > | > | which avoids subprocesses, quoting hell and all such issues... > > Love the use of read.dcf("/proc/cpuinfo") !! > > On my box a simpler > >> d <- read.dcf("/proc/cpuinfo") >> as.integer(unique(d[, grep("cpu cores",colnames(d))])) > [1] 6 >> >I don't think that works on NUMA/SMP machines - you need to add all the cores for each CPU (that's why the above splits by physical id which is unique per cpu). On a dual-cpu machine:> as.integer(unique(d[, grep("cpu cores",colnames(d))]))[1] 32> sum(as.integer(tapply(d[,grep("cpu cores",colnames(d))], d[,grep("physical id",colnames(d))], `[`, 1))) [1] 64 Also things get quite fun on VMs as they can cobble together quite a few virtual CPUs regardless of the underlying hardware. To be honest I think the motivation of this thread is dubious at best: it is a bad idea to use detectCore() blindly to specify parallelization and we explicitly say it's a bad idea - any sensible person will set it according to the demands, the hardware and the task. The number of cores is only partially relevant - e.g. if any I/O is involved you want to oversubscribe the CPU. If you have other users you want to only use a fraction etc. That doesn't mean that the we couldn't do a better job, but if you have to use detectCores() then you are already in trouble to start with. Cheers, Simon