Simon Urbanek
2023-Aug-07  23:21 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
First, detecting HT vs cores is not necessarily possible in general, Linux may
assign core id to each HT depending on circumstances:
$ grep 'cpu cores' /proc/cpuinfo | uniq
cpu cores	: 32
$ grep 'model name' /proc/cpuinfo | uniq
model name	: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
and you can look up that Xenon 6142 has 16 cores.
Second, instead of "awk"ward contortions it's easily done in R
with something like
d=read.dcf("/proc/cpuinfo")
sum(as.integer(tapply(
  d[,grep("cpu cores",colnames(d))],
  d[,grep("physical id",colnames(d))], `[`, 1)))
which avoids subprocesses, quoting hell and all such issues...
Cheers,
Simon
> On 8/08/2023, at 12:47 AM, Julian Hniopek <julian.hniopek at
uni-jena.de> wrote:
> 
> On Mon, 2023-08-07 at 07:12 -0500, Dirk Eddelbuettel wrote:
>> 
>> On 7 August 2023 at 08:48, Nils Kehrein wrote:
>>> I recently noticed that `detectCores()` ignores the `logical=FALSE`
>>> argument on Linux platforms. This means that the function will
>>> always
>>> return the number of logical CPUs, i.e. it will count the number of
>>> threads
>>> that theoretically can run in parallel due to e.g. hyper-threading.
>>> Unfortunately, this can result in issues in high-performance
>>> computing use
>>> cases where hyper-threading might degrade performance instead of
>>> improving
>>> it.
>>> 
>>> Currently, src/library/parallel/R/detectCores.R uses the following
>>> R/shell
>>> code fragment to identify the number of logical CPUs:
>>> linux = 'grep "^processor" /proc/cpuinfo
2>/dev/null | wc -l'
>>> 
>>> As far as I understand, one could derive the number of online
>>> physical CPUs
>>> by parsing the contents of /sys/devices/system/cpu/* but that seems
>>> rather
>>> cumbersome. Instead, could we amend the R code with the following
>>> line?
>>> linux = if(logical) 'grep "^processor" /proc/cpuinfo
2>/dev/null |
>>> wc -l'
>>> else 'lscpu -b --parse="CORE" | tail -n +5 | sort -u
| wc -l'
>> 
>> That's good but you also need to at protect this from `lscpu` being
>> in the
>> path.  Maybe `if (logical &&
nzchar(Sys.which("lscpu")))` ?
>> 
>> Dirk
>> 
> Alternatively, using only on POSIX utils which should be in the path of
> all Linux Systems and /proc/cpuinfo:
> 
> awk '/^physical id/{PHYS_ID=$NF; next} /^cpu cores/{print PHYS_ID"
> "$NF;}' /proc/cpuinfo 2>/dev/null | sort | uniq | awk
'{sum+=$NF;} END
> {print sum}'.
> 
> Parses /proc/cpuinfo for the number of physical cores and physical id
> in each CPU. Only returns unique combinations of physical id (i.e.
> Socket) and core numbers. Then sums up the number of cores for each
> physicalid to get the total amount of physical cores.
> 
> Something I had lying around. Someone with better awk skills could
> probably do sorting and filtering in awk as well to save on pipes.
> Works on single and multisocket AMD/Intel from my experience.
> 
> Julian
>>> 
>>> This solution uses `lscpu` from `sys-utils`. The -b switch makes
>>> sure that
>>> only online CPUs/cores are listed and due to the
--parse="CORE",
>>> the output
>>> will contain only a single column with logical core ids. It seems
>>> to do the
>>> job in my view, but there might be edge cases for exotic CPU
>>> topologies
>>> that I am not aware of.
>>> 
>>> Thank you, Nils
>>> 
>>>         [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
Dirk Eddelbuettel
2023-Aug-08  00:07 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
On 8 August 2023 at 11:21, Simon Urbanek wrote:
| First, detecting HT vs cores is not necessarily possible in general, Linux may
assign core id to each HT depending on circumstances:
| 
| $ grep 'cpu cores' /proc/cpuinfo | uniq
| cpu cores	: 32
| $ grep 'model name' /proc/cpuinfo | uniq
| model name	: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
| 
| and you can look up that Xenon 6142 has 16 cores.
| 
| Second, instead of "awk"ward contortions it's easily done in R
with something like
| 
| d=read.dcf("/proc/cpuinfo")
| sum(as.integer(tapply(
|   d[,grep("cpu cores",colnames(d))],
|   d[,grep("physical id",colnames(d))], `[`, 1)))
| 
| which avoids subprocesses, quoting hell and all such issues...
Love the use of read.dcf("/proc/cpuinfo") !!
On my box a simpler
  > d <- read.dcf("/proc/cpuinfo") 
  > as.integer(unique(d[, grep("cpu cores",colnames(d))]))
  [1] 6
  > 
does the right thing.
Dirk
-- 
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org