Julian Hniopek
2023-Aug-07 12:47 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
On Mon, 2023-08-07 at 07:12 -0500, Dirk Eddelbuettel wrote:> > On 7 August 2023 at 08:48, Nils Kehrein wrote: > > I recently noticed that `detectCores()` ignores the `logical=FALSE` > > argument on Linux platforms. This means that the function will > > always > > return the number of logical CPUs, i.e. it will count the number of > > threads > > that theoretically can run in parallel due to e.g. hyper-threading. > > Unfortunately, this can result in issues in high-performance > > computing use > > cases where hyper-threading might degrade performance instead of > > improving > > it. > > > > Currently, src/library/parallel/R/detectCores.R uses the following > > R/shell > > code fragment to identify the number of logical CPUs: > > linux = 'grep "^processor" /proc/cpuinfo 2>/dev/null | wc -l' > > > > As far as I understand, one could derive the number of online > > physical CPUs > > by parsing the contents of /sys/devices/system/cpu/* but that seems > > rather > > cumbersome. Instead, could we amend the R code with the following > > line? > > linux = if(logical) 'grep "^processor" /proc/cpuinfo 2>/dev/null | > > wc -l' > > else 'lscpu -b --parse="CORE" | tail -n +5 | sort -u | wc -l' > > That's good but you also need to at protect this from `lscpu` being > in the > path.? Maybe `if (logical && nzchar(Sys.which("lscpu")))` ? > > Dirk >Alternatively, using only on POSIX utils which should be in the path of all Linux Systems and /proc/cpuinfo: awk '/^physical id/{PHYS_ID=$NF; next} /^cpu cores/{print PHYS_ID" "$NF;}' /proc/cpuinfo 2>/dev/null | sort | uniq | awk '{sum+=$NF;} END {print sum}'. Parses /proc/cpuinfo for the number of physical cores and physical id in each CPU. Only returns unique combinations of physical id (i.e. Socket) and core numbers. Then sums up the number of cores for each physicalid to get the total amount of physical cores. Something I had lying around. Someone with better awk skills could probably do sorting and filtering in awk as well to save on pipes. Works on single and multisocket AMD/Intel from my experience. Julian> > > > This solution uses `lscpu` from `sys-utils`. The -b switch makes > > sure that > > only online CPUs/cores are listed and due to the --parse="CORE", > > the output > > will contain only a single column with logical core ids. It seems > > to do the > > job in my view, but there might be edge cases for exotic CPU > > topologies > > that I am not aware of. > > > > Thank you, Nils > > > > ????????[[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org?mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel >
Simon Urbanek
2023-Aug-07 23:21 UTC
[Rd] Detecting physical CPUs in detectCores() on Linux platforms
First, detecting HT vs cores is not necessarily possible in general, Linux may assign core id to each HT depending on circumstances: $ grep 'cpu cores' /proc/cpuinfo | uniq cpu cores : 32 $ grep 'model name' /proc/cpuinfo | uniq model name : Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz and you can look up that Xenon 6142 has 16 cores. Second, instead of "awk"ward contortions it's easily done in R with something like d=read.dcf("/proc/cpuinfo") sum(as.integer(tapply( d[,grep("cpu cores",colnames(d))], d[,grep("physical id",colnames(d))], `[`, 1))) which avoids subprocesses, quoting hell and all such issues... Cheers, Simon> On 8/08/2023, at 12:47 AM, Julian Hniopek <julian.hniopek at uni-jena.de> wrote: > > On Mon, 2023-08-07 at 07:12 -0500, Dirk Eddelbuettel wrote: >> >> On 7 August 2023 at 08:48, Nils Kehrein wrote: >>> I recently noticed that `detectCores()` ignores the `logical=FALSE` >>> argument on Linux platforms. This means that the function will >>> always >>> return the number of logical CPUs, i.e. it will count the number of >>> threads >>> that theoretically can run in parallel due to e.g. hyper-threading. >>> Unfortunately, this can result in issues in high-performance >>> computing use >>> cases where hyper-threading might degrade performance instead of >>> improving >>> it. >>> >>> Currently, src/library/parallel/R/detectCores.R uses the following >>> R/shell >>> code fragment to identify the number of logical CPUs: >>> linux = 'grep "^processor" /proc/cpuinfo 2>/dev/null | wc -l' >>> >>> As far as I understand, one could derive the number of online >>> physical CPUs >>> by parsing the contents of /sys/devices/system/cpu/* but that seems >>> rather >>> cumbersome. Instead, could we amend the R code with the following >>> line? >>> linux = if(logical) 'grep "^processor" /proc/cpuinfo 2>/dev/null | >>> wc -l' >>> else 'lscpu -b --parse="CORE" | tail -n +5 | sort -u | wc -l' >> >> That's good but you also need to at protect this from `lscpu` being >> in the >> path. Maybe `if (logical && nzchar(Sys.which("lscpu")))` ? >> >> Dirk >> > Alternatively, using only on POSIX utils which should be in the path of > all Linux Systems and /proc/cpuinfo: > > awk '/^physical id/{PHYS_ID=$NF; next} /^cpu cores/{print PHYS_ID" > "$NF;}' /proc/cpuinfo 2>/dev/null | sort | uniq | awk '{sum+=$NF;} END > {print sum}'. > > Parses /proc/cpuinfo for the number of physical cores and physical id > in each CPU. Only returns unique combinations of physical id (i.e. > Socket) and core numbers. Then sums up the number of cores for each > physicalid to get the total amount of physical cores. > > Something I had lying around. Someone with better awk skills could > probably do sorting and filtering in awk as well to save on pipes. > Works on single and multisocket AMD/Intel from my experience. > > Julian >>> >>> This solution uses `lscpu` from `sys-utils`. The -b switch makes >>> sure that >>> only online CPUs/cores are listed and due to the --parse="CORE", >>> the output >>> will contain only a single column with logical core ids. It seems >>> to do the >>> job in my view, but there might be edge cases for exotic CPU >>> topologies >>> that I am not aware of. >>> >>> Thank you, Nils >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >