Simon Urbanek
2023-Aug-07 23:21 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
First, detecting HT vs cores is not necessarily possible in general, Linux may assign core id to each HT depending on circumstances: $ grep 'cpu cores' /proc/cpuinfo | uniq cpu cores : 32 $ grep 'model name' /proc/cpuinfo | uniq model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz and you can look up that Xenon 6142 has 16 cores. Second, instead of "awk"ward contortions it's easily done in R with something like d=read.dcf("/proc/cpuinfo") sum(as.integer(tapply( d[,grep("cpu cores",colnames(d))], d[,grep("physical id",colnames(d))], `[`, 1))) which avoids subprocesses, quoting hell and all such issues... Cheers, Simon> On 8/08/2023, at 12:47 AM, Julian Hniopek <julian.hniopek at uni-jena.de> wrote: > > On Mon, 2023-08-07 at 07:12 -0500, Dirk Eddelbuettel wrote: >> >> On 7 August 2023 at 08:48, Nils Kehrein wrote: >>> I recently noticed that `detectCores()` ignores the `logical=FALSE` >>> argument on Linux platforms. This means that the function will >>> always >>> return the number of logical CPUs, i.e. it will count the number of >>> threads >>> that theoretically can run in parallel due to e.g. hyper-threading. >>> Unfortunately, this can result in issues in high-performance >>> computing use >>> cases where hyper-threading might degrade performance instead of >>> improving >>> it. >>> >>> Currently, src/library/parallel/R/detectCores.R uses the following >>> R/shell >>> code fragment to identify the number of logical CPUs: >>> linux = 'grep "^processor" /proc/cpuinfo 2>/dev/null | wc -l' >>> >>> As far as I understand, one could derive the number of online >>> physical CPUs >>> by parsing the contents of /sys/devices/system/cpu/* but that seems >>> rather >>> cumbersome. Instead, could we amend the R code with the following >>> line? >>> linux = if(logical) 'grep "^processor" /proc/cpuinfo 2>/dev/null | >>> wc -l' >>> else 'lscpu -b --parse="CORE" | tail -n +5 | sort -u | wc -l' >> >> That's good but you also need to at protect this from `lscpu` being >> in the >> path. Maybe `if (logical && nzchar(Sys.which("lscpu")))` ? >> >> Dirk >> > Alternatively, using only on POSIX utils which should be in the path of > all Linux Systems and /proc/cpuinfo: > > awk '/^physical id/{PHYS_ID=$NF; next} /^cpu cores/{print PHYS_ID" > "$NF;}' /proc/cpuinfo 2>/dev/null | sort | uniq | awk '{sum+=$NF;} END > {print sum}'. > > Parses /proc/cpuinfo for the number of physical cores and physical id > in each CPU. Only returns unique combinations of physical id (i.e. > Socket) and core numbers. Then sums up the number of cores for each > physicalid to get the total amount of physical cores. > > Something I had lying around. Someone with better awk skills could > probably do sorting and filtering in awk as well to save on pipes. > Works on single and multisocket AMD/Intel from my experience. > > Julian >>> >>> This solution uses `lscpu` from `sys-utils`. The -b switch makes >>> sure that >>> only online CPUs/cores are listed and due to the --parse="CORE", >>> the output >>> will contain only a single column with logical core ids. It seems >>> to do the >>> job in my view, but there might be edge cases for exotic CPU >>> topologies >>> that I am not aware of. >>> >>> Thank you, Nils >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Dirk Eddelbuettel
2023-Aug-08 00:07 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
On 8 August 2023 at 11:21, Simon Urbanek wrote: | First, detecting HT vs cores is not necessarily possible in general, Linux may assign core id to each HT depending on circumstances: | | $ grep 'cpu cores' /proc/cpuinfo | uniq | cpu cores : 32 | $ grep 'model name' /proc/cpuinfo | uniq | model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz | | and you can look up that Xenon 6142 has 16 cores. | | Second, instead of "awk"ward contortions it's easily done in R with something like | | d=read.dcf("/proc/cpuinfo") | sum(as.integer(tapply( | d[,grep("cpu cores",colnames(d))], | d[,grep("physical id",colnames(d))], `[`, 1))) | | which avoids subprocesses, quoting hell and all such issues... Love the use of read.dcf("/proc/cpuinfo") !! On my box a simpler > d <- read.dcf("/proc/cpuinfo") > as.integer(unique(d[, grep("cpu cores",colnames(d))])) [1] 6 > does the right thing. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org