Hi, I am trying to use GotoBLAS2 on R 3.0 on Unix. I downloaded GotoBLAS2 source code from TACC web site, compiled it, and replaced libRblas.so with libgoto2.so, following the instructions at the link http://www.rochester.edu/college/gradstudents/jolmsted/files/computing/BLAS.pdf. The simple matrix operations in R like "determinant" are 20 times faster than before (I am using huge matrices), which is good. However, I cannot use many cores in parallel now. For example, below code runs forever. But if I use commented out "for" instead of "foreach", it takes just a second. When I was using R's default BLAS library, I could run below code (using many cores) (but it took more time since BLAS was not optimized, of course).. library("foreach") library("doParallel") registerDoParallel(cores=2)set.seed(100) foreach (i = 1:2) %dopar% {# for (i in 1:2) { a = replicate(1000, rnorm(1000)) d = determinant(a) So, is it possible to use many cores at the same time with GotoBLAS2, do you have any ideas? Thanks a lot in advance. -- -safiye [[alternative HTML version deleted]]
This really is the wrong list (R-devel or R-SIG-HPC?): see the posting guide. But there are two issues. - How your BLAS controls its core usage (which is in its documentation). - How your third-party package interacts with a BLAS which uses multiple cores, and for that you need to ask the package maintainers. On 24/11/2013 09:03, Safiye Celik wrote:> Hi, > > I am trying to use GotoBLAS2 on R 3.0 on Unix. I downloaded GotoBLAS2 > source code from TACC web site, compiled it, and replaced libRblas.so with > libgoto2.so, following the instructions at the link > http://www.rochester.edu/college/gradstudents/jolmsted/files/computing/BLAS.pdf.Much better to use the definitive R documentation at http://cran.r-project.org/doc/manuals/r-release/R-admin.html#Shared-BLAS .> The simple matrix operations in R like "determinant" are 20 times faster > than before (I am using huge matrices), which is good. However, I cannot > use many cores in parallel now. > > For example, below code runs forever. But if I use commented out "for" > instead of "foreach", it takes just a second. When I was using R's default > BLAS library, I could run below code (using many cores) (but it took more > time since BLAS was not optimized, of course).. > > library("foreach") > library("doParallel") > > registerDoParallel(cores=2)set.seed(100) > foreach (i = 1:2) %dopar% {# for (i in 1:2) { > a = replicate(1000, rnorm(1000)) > d = determinant(a) > > So, is it possible to use many cores at the same time with GotoBLAS2, do > you have any ideas? > > Thanks a lot in advance. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Your report sounds somewhat similar to problems I encountered with OpenBLAS on Ubuntu Linux (which is a maintained version of GotoBLAS; I couldn't get the latter to compile properly). OpenBLAS uses OpenMP for parallelization. Once linked into R, other OpenMP-based code would only use a single core any more. I have never used foreach, but I can imagine that something similar might be going on. The workaround I found was to compile OpenBLAS with the setting NO_AFFINITY=1. In addition, if you're running parallel code, I'd very much recommend to limit the number of threads within each parallel process by setting the OMP_NUM_THREADS environment variable (or in some other way). Hope this helps, Stefan On 24 Nov 2013, at 10:03, Safiye Celik <safisce at gmail.com> wrote:> So, is it possible to use many cores at the same time with GotoBLAS2, do > you have any ideas?