Ghislain Durif
2017-Aug-21 12:55 UTC
[Rd] Control multi-threading in standard matrix product
Dear R Core Team, I wish to report what can be viewed as a bug or at least a strange behavior in R-3.4.1. I ask my question here (as recommended on https://www.r-project.org/bugs.html) since I am not member of the R's Bugzilla. When running 'R --vanilla' from the command line, the standard matrix product is by default based on BLAS and multi-threaded on all cores available on the machine, c.f. following examples: n=10000 p=1000 q=5000 A = matrix(runif(n*p),nrow=n, ncol=p) B = matrix(runif(p*q),nrow=p, ncol=q) C = A %*% B # multi-threaded matrix product However, the default behavior to use all available cores can be an issue, especially on shared computing resources or when the matrix product is used in parallelized section of codes (for instance with 'mclapply' from the 'parallel' package). For instance, the default matrix product is single-threaded in R-3.3.2 (I ran a test on my machine), this new features will deeply affect the behavior of existing R packages that use other multi-threading solutions. Thanks to this stackoverflow question (https://stackoverflow.com/questions/45794290/in-r-how-to-control-multi-threading-in-blas-parallel-matrix-product), I now know that it is possible to control the number of BLAS threads thanks to the package 'RhpcBLASctl'. However, being able to control the number of threads should maybe not require to use an additional package. In addition, the doc 'matmult' does not mention this point, it points to the 'options' doc page and especially the 'matprod' section, in which the multi-threading is not discussed. Here is the results of the 'sessionInfo()' function on my machine for R-3.4.1: R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/libopenblasp-r0.2.18.so locale: [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C [3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8 [5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8 [7] LC_PAPER=fr_FR.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.1 and for R-3.3.2: R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS locale: [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C [3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8 [5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8 [7] LC_PAPER=fr_FR.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Thanks in advance, Best regards || -- Ghislain Durif -------------------------- Research engineer THOTH TEAM INRIA Grenoble Alpes (France) [[alternative HTML version deleted]]
Tomas Kalibera
2017-Aug-21 13:53 UTC
[Rd] Control multi-threading in standard matrix product
Hi Ghislain, I think you might be comparing two versions of R with different BLAS implementations, one that is single threaded (is your 3.3.2 used with reference blas?) and one that is multi threaded (3.4.1 with openblas). Could you check with "perf"? E.g. run your benchmark with "perf record" in both cases and you should see the names of the hot BLAS functions and this should reveal the BLAS implementation (look for dgemm). In Ubuntu, if you install R from the package system, whenever you run it it will use the BLAS currently installed via the package system. However if you build R from source on Ubuntu, by default, it will use the reference BLAS which is distributed with R. Section "Linear algebra" of "R Installation and Administration" has details on how to build R with different BLAS/LAPACK implementations. Sadly there is no standard way to specify the number of BLAS worker threads. RhpcBLASctl has specific code for several existing implementations, but R itself does not attempt to control BLAS multi threading in any way. It is expected the user/system administrator will configure their BLAS implementation of choice to use the number of threads they need. A similar problem exists in other internally multi-threaded third-party libraries, used by packages - R cannot control how many threads they run. Best Tomas On 08/21/2017 02:55 PM, Ghislain Durif wrote:> Dear R Core Team, > > I wish to report what can be viewed as a bug or at least a strange > behavior in R-3.4.1. I ask my question here (as recommended on > https://www.r-project.org/bugs.html) since I am not member of the R's > Bugzilla. > > When running 'R --vanilla' from the command line, the standard matrix > product is by default based on BLAS and multi-threaded on all cores > available on the machine, c.f. following examples: > > n=10000 > p=1000 > q=5000 > A = matrix(runif(n*p),nrow=n, ncol=p) > B = matrix(runif(p*q),nrow=p, ncol=q) > C = A %*% B # multi-threaded matrix product > > > However, the default behavior to use all available cores can be an > issue, especially on shared computing resources or when the matrix > product is used in parallelized section of codes (for instance with > 'mclapply' from the 'parallel' package). For instance, the default > matrix product is single-threaded in R-3.3.2 (I ran a test on my > machine), this new features will deeply affect the behavior of existing > R packages that use other multi-threading solutions. > > Thanks to this stackoverflow question > (https://stackoverflow.com/questions/45794290/in-r-how-to-control-multi-threading-in-blas-parallel-matrix-product), > I now know that it is possible to control the number of BLAS threads > thanks to the package 'RhpcBLASctl'. However, being able to control the > number of threads should maybe not require to use an additional package. > > In addition, the doc 'matmult' does not mention this point, it points to > the 'options' doc page and especially the 'matprod' section, in which > the multi-threading is not discussed. > > > Here is the results of the 'sessionInfo()' function on my machine for > R-3.4.1: > R version 3.4.1 (2017-06-30) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.3 LTS > > Matrix products: default > BLAS: /usr/lib/openblas-base/libblas.so.3 > LAPACK: /usr/lib/libopenblasp-r0.2.18.so > > locale: > [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C > [3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8 > [5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8 > [7] LC_PAPER=fr_FR.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.4.1 > > > > and for R-3.3.2: > R version 3.3.2 (2016-10-31) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.3 LTS > > locale: > [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C > [3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8 > [5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8 > [7] LC_PAPER=fr_FR.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > Thanks in advance, > Best regards > || >
luke-tierney at uiowa.edu
2017-Aug-21 14:07 UTC
[Rd] Control multi-threading in standard matrix product
Many, though not all, threaded BLAS honor the OpenMP environment variables; you might see whether setting OMP_THEAD_LIMIT or OMP_NUM_THREADS does what you want. Best, luke On Mon, 21 Aug 2017, Tomas Kalibera wrote:> Hi Ghislain, > > I think you might be comparing two versions of R with different BLAS > implementations, one that is single threaded (is your 3.3.2 used with > reference blas?) and one that is multi threaded (3.4.1 with openblas). Could > you check with "perf"? E.g. run your benchmark with "perf record" in both > cases and you should see the names of the hot BLAS functions and this should > reveal the BLAS implementation (look for dgemm). > > In Ubuntu, if you install R from the package system, whenever you run it it > will use the BLAS currently installed via the package system. However if you > build R from source on Ubuntu, by default, it will use the reference BLAS > which is distributed with R. Section "Linear algebra" of "R Installation and > Administration" has details on how to build R with different BLAS/LAPACK > implementations. > > Sadly there is no standard way to specify the number of BLAS worker threads. > RhpcBLASctl has specific code for several existing implementations, but R > itself does not attempt to control BLAS multi threading in any way. It is > expected the user/system administrator will configure their BLAS > implementation of choice to use the number of threads they need. A similar > problem exists in other internally multi-threaded third-party libraries, used > by packages - R cannot control how many threads they run. > > Best > Tomas > > On 08/21/2017 02:55 PM, Ghislain Durif wrote: >> Dear R Core Team, >> >> I wish to report what can be viewed as a bug or at least a strange >> behavior in R-3.4.1. I ask my question here (as recommended on >> https://www.r-project.org/bugs.html) since I am not member of the R's >> Bugzilla. >> >> When running 'R --vanilla' from the command line, the standard matrix >> product is by default based on BLAS and multi-threaded on all cores >> available on the machine, c.f. following examples: >> >> n=10000 >> p=1000 >> q=5000 >> A = matrix(runif(n*p),nrow=n, ncol=p) >> B = matrix(runif(p*q),nrow=p, ncol=q) >> C = A %*% B # multi-threaded matrix product >> >> >> However, the default behavior to use all available cores can be an >> issue, especially on shared computing resources or when the matrix >> product is used in parallelized section of codes (for instance with >> 'mclapply' from the 'parallel' package). For instance, the default >> matrix product is single-threaded in R-3.3.2 (I ran a test on my >> machine), this new features will deeply affect the behavior of existing >> R packages that use other multi-threading solutions. >> >> Thanks to this stackoverflow question >> (https://stackoverflow.com/questions/45794290/in-r-how-to-control-multi-threading-in-blas-parallel-matrix-product), >> I now know that it is possible to control the number of BLAS threads >> thanks to the package 'RhpcBLASctl'. However, being able to control the >> number of threads should maybe not require to use an additional package. >> >> In addition, the doc 'matmult' does not mention this point, it points to >> the 'options' doc page and especially the 'matprod' section, in which >> the multi-threading is not discussed. >> >> >> Here is the results of the 'sessionInfo()' function on my machine for >> R-3.4.1: >> R version 3.4.1 (2017-06-30) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 16.04.3 LTS >> >> Matrix products: default >> BLAS: /usr/lib/openblas-base/libblas.so.3 >> LAPACK: /usr/lib/libopenblasp-r0.2.18.so >> >> locale: >> [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C >> [3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8 >> [5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8 >> [7] LC_PAPER=fr_FR.utf8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] compiler_3.4.1 >> >> >> >> and for R-3.3.2: >> R version 3.3.2 (2016-10-31) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 16.04.3 LTS >> >> locale: >> [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C >> [3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8 >> [5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8 >> [7] LC_PAPER=fr_FR.utf8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> >> Thanks in advance, >> Best regards >> || >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Ghislain Durif
2017-Aug-21 14:13 UTC
[Rd] Control multi-threading in standard matrix product
Hi Tomas, Thanks for your answer. Indeed, I checked and my R-3.4.1 installed from the ubuntu repository use 'libopenblasp-r0.2.18.so' while my R-3.3.2 that I did compiled on my machine use 'libRblas.so' which explain the difference of behavior. I will use RhpcBLASctl to avoid issue when combining matrix product and other multi-threading package. Maybe this point regarding multi-threading with BLAS could be added in the R doc. Thanks again, Best, Ghislain Ghislain Durif -------------------------- Research engineer THOTH TEAM INRIA Grenoble Alpes (France) Le 21/08/2017 ? 15:53, Tomas Kalibera a ?crit :> Hi Ghislain, > > I think you might be comparing two versions of R with different BLAS > implementations, one that is single threaded (is your 3.3.2 used with > reference blas?) and one that is multi threaded (3.4.1 with openblas). > Could you check with "perf"? E.g. run your benchmark with "perf > record" in both cases and you should see the names of the hot BLAS > functions and this should reveal the BLAS implementation (look for > dgemm). > > In Ubuntu, if you install R from the package system, whenever you run > it it will use the BLAS currently installed via the package system. > However if you build R from source on Ubuntu, by default, it will use > the reference BLAS which is distributed with R. Section "Linear > algebra" of "R Installation and Administration" has details on how to > build R with different BLAS/LAPACK implementations. > > Sadly there is no standard way to specify the number of BLAS worker > threads. RhpcBLASctl has specific code for several existing > implementations, but R itself does not attempt to control BLAS multi > threading in any way. It is expected the user/system administrator > will configure their BLAS implementation of choice to use the number > of threads they need. A similar problem exists in other internally > multi-threaded third-party libraries, used by packages - R cannot > control how many threads they run. > > Best > Tomas > > On 08/21/2017 02:55 PM, Ghislain Durif wrote: >> Dear R Core Team, >> >> I wish to report what can be viewed as a bug or at least a strange >> behavior in R-3.4.1. I ask my question here (as recommended on >> https://www.r-project.org/bugs.html) since I am not member of the R's >> Bugzilla. >> >> When running 'R --vanilla' from the command line, the standard matrix >> product is by default based on BLAS and multi-threaded on all cores >> available on the machine, c.f. following examples: >> >> n=10000 >> p=1000 >> q=5000 >> A = matrix(runif(n*p),nrow=n, ncol=p) >> B = matrix(runif(p*q),nrow=p, ncol=q) >> C = A %*% B # multi-threaded matrix product >> >> >> However, the default behavior to use all available cores can be an >> issue, especially on shared computing resources or when the matrix >> product is used in parallelized section of codes (for instance with >> 'mclapply' from the 'parallel' package). For instance, the default >> matrix product is single-threaded in R-3.3.2 (I ran a test on my >> machine), this new features will deeply affect the behavior of existing >> R packages that use other multi-threading solutions. >> >> Thanks to this stackoverflow question >> (https://stackoverflow.com/questions/45794290/in-r-how-to-control-multi-threading-in-blas-parallel-matrix-product), >> >> I now know that it is possible to control the number of BLAS threads >> thanks to the package 'RhpcBLASctl'. However, being able to control the >> number of threads should maybe not require to use an additional package. >> >> In addition, the doc 'matmult' does not mention this point, it points to >> the 'options' doc page and especially the 'matprod' section, in which >> the multi-threading is not discussed. >> >> >> Here is the results of the 'sessionInfo()' function on my machine for >> R-3.4.1: >> R version 3.4.1 (2017-06-30) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 16.04.3 LTS >> >> Matrix products: default >> BLAS: /usr/lib/openblas-base/libblas.so.3 >> LAPACK: /usr/lib/libopenblasp-r0.2.18.so >> >> locale: >> [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C >> [3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8 >> [5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8 >> [7] LC_PAPER=fr_FR.utf8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] compiler_3.4.1 >> >> >> >> and for R-3.3.2: >> R version 3.3.2 (2016-10-31) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 16.04.3 LTS >> >> locale: >> [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C >> [3] LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8 >> [5] LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8 >> [7] LC_PAPER=fr_FR.utf8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> >> Thanks in advance, >> Best regards >> || >> >
Seemingly Similar Threads
- Control multi-threading in standard matrix product
- Control multi-threading in standard matrix product
- Control multi-threading in standard matrix product
- Control multi-threading in standard matrix product
- Why is matrix product slower when matrix has very small values?