Paul Johnson
2015-Nov-23 17:27 UTC
[Rd] MKL Acceleration encouraging; need adjust package builds?
Dear R-devel: The Cluster administrators at KU got enthusiastic about testing R-3.2.2 with Intel MKL when I asked for some BLAS integration. Below I forward a performance report, which is encouraging, and thought you would like to know the numbers. Appears to my untrained eye there are some extraordinary speedups on Cholesky decomposition, determinants, and matrix inversion. They had difficulty getting R to compile with R shared BLAS (don't know what went wrong there), so they went the other direction. In his message to me, the technician says that I should consider adjusting the compilation flags on the packages that use BLAS. Do you think that is needed? R is compiled with non-shared BLAS libraries, won't packages know where to look for BLAS headers? 2. If I need to do that, I wonder how to do it and which packages need attention. Eigen and Armadillo packages, and possibly the ones that depend on them, lme4, anything flowing through Rcpp. Here's the build for some packages. Are they finding MKL BLAS? How would I know? * installing *source* package 'RcppArmadillo' ... ** package 'RcppArmadillo' successfully unpacked and MD5 sums checked * checking LAPACK_LIBS: divide-and-conquer complex SVD available via system LAPACK ** libs g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c RcppArmadillo.cpp -o RcppArmadillo.o g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c RcppExports.cpp -o RcppExports.o g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c fastLm.cpp -o fastLm.o g++ -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -L/usr/local/lib64 -o RcppArmadillo.so RcppArmadillo.o RcppExports.o fastLm.o -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR installing to /panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/RcppArmadillo/libs ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded * DONE (RcppArmadillo) * installing *source* package 'RcppEigen' ... ** package 'RcppEigen' successfully unpacked and MD5 sums checked ** libs g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c RcppEigen.cpp -o RcppEigen.o g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c RcppExports.cpp -o RcppExports.o g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c fastLm.cpp -o fastLm.o g++ -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -L/usr/local/lib64 -o RcppEigen.so RcppEigen.o RcppExports.o fastLm.o -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR installing to /panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/RcppEigen/libs ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded * DONE (RcppEigen) * installing *source* package 'MatrixModels' ... ** package 'MatrixModels' successfully unpacked and MD5 sums checked ** R ** preparing package for lazy loading Creating a generic function for 'resid' from package 'stats' in package 'MatrixModels' Creating a generic function for 'fitted.values' from package 'stats' in package 'MatrixModels' Creating a generic function for 'coefficients' from package 'stats' in package 'MatrixModels' Creating a generic function for 'formula' from package 'stats' in package 'MatrixModels' Creating a generic function for 'coef' from package 'stats' in package 'MatrixModels' Creating a generic function for 'fitted' from package 'stats' in package 'MatrixModels' Creating a generic function for 'residuals' from package 'stats' in package 'MatrixModels' ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (MatrixModels) * installing *source* package 'quantreg' ... ** package 'quantreg' successfully unpacked and MD5 sums checked ** libs gfortran -fpic -g -O2 -c akj.f -o akj.o gfortran -fpic -g -O2 -c boot.f -o boot.o gfortran -fpic -g -O2 -c brute.f -o brute.o gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -fpic -I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include -L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64 -c chlfct.c -o chlfct.o gfortran -fpic -g -O2 -c cholesky.f -o cholesky.o gfortran -fpic -g -O2 -c combos.f -o combos.o gfortran -fpic -g -O2 -c crq.f -o crq.o gfortran -fpic -g -O2 -c crqfnb.f -o crqfnb.o gfortran -fpic -g -O2 -c dsel05.f -o dsel05.o gfortran -fpic -g -O2 -c etime.f -o etime.o gfortran -fpic -g -O2 -c extract.f -o extract.o gfortran -fpic -g -O2 -c idmin.f -o idmin.o gfortran -fpic -g -O2 -c iswap.f -o iswap.o gfortran -fpic -g -O2 -c kuantile.f -o kuantile.o gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -fpic -I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include -L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64 -c mcmb.c -o mcmb.o gfortran -fpic -g -O2 -c penalty.f -o penalty.o gfortran -fpic -g -O2 -c powell.f -o powell.o gfortran -fpic -g -O2 -c rls.f -o rls.o gfortran -fpic -g -O2 -c rq0.f -o rq0.o gfortran -fpic -g -O2 -c rq1.f -o rq1.o gfortran -fpic -g -O2 -c rqbr.f -o rqbr.o gfortran -fpic -g -O2 -c rqfn.f -o rqfn.o gfortran -fpic -g -O2 -c rqfnb.f -o rqfnb.o gfortran -fpic -g -O2 -c rqfnc.f -o rqfnc.o gfortran -fpic -g -O2 -c rqs.f -o rqs.o gfortran -fpic -g -O2 -c sparskit2.f -o sparskit2.o gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -fpic -I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include -L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64 -c srqfn.c -o srqfn.o gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -fpic -I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include -L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64 -c srqfnc.c -o srqfnc.o gfortran -fpic -g -O2 -c srtpai.f -o srtpai.o gcc -std=gnu99 -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -L/usr/local/lib64 -o quantreg.so akj.o boot.o brute.o chlfct.o cholesky.o combos.o crq.o crqfnb.o dsel05.o etime.o extract.o idmin.o iswap.o kuantile.o mcmb.o penalty.o powell.o rls.o rq0.o rq1.o rqbr.o rqfn.o rqfnb.o rqfnc.o rqs.o sparskit2.o srqfn.o srqfnc.o srtpai.o -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm -lgfortran -lm -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR installing to /panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/quantreg/libs ** R ** data ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded * DONE (quantreg) pj Hi PJ, We're still running the benchmarks to quantify the performance increase. The R benchmarks for the MKL version are promising. The performance increase is varied from test to test, but there isn't any degradation in performance by using the MKL version. You can expect a 2x to 10x performance increase depending on the matrix calculations you are performing. Here are the compilation arguments we used for compiling R with MKL: --disable-BLAS-shlib --with-blas="-L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -W l,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm" --with-lapack You may want to include these while recompiling R packages which use BLAS. Here are the results of the benchmark for the standard R 3.2.2: R Benchmark 2.5 ==============Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 2.69466666666667 2400x2400 normal distributed random matrix ^1000____ (sec): 1.42433333333333 Sorting of 7,000,000 random values__________________ (sec): 2.34466666666667 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 33.187 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 14.52 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 4.51008013606039 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 1.203 Eigenvalues of a 640x640 random matrix______________ (sec): 1.60599999999999 Determinant of a 2500x2500 random matrix____________ (sec): 7.64266666666667 Cholesky decomposition of a 3000x3000 matrix________ (sec): 8.05900000000001 Inverse of a 1600x1600 random matrix________________ (sec): 8.64166666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 4.62477425061321 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.25633333333335 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.894999999999982 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.714 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.4013333333333 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 2.041 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.44505946077978 Total time for all 15 tests_________________________ (sec): 88.6306666666667 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 3.11209972260597 --- End of test --- Here are the results for the MKL version: R Benchmark 2.5 ==============Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 2.88466666666667 2400x2400 normal distributed random matrix ^1000____ (sec): 1.45933333333333 Sorting of 7,000,000 random values__________________ (sec): 2.35166666666667 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 3.37233333333333 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 1.68666666666666 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 2.25337542617509 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 1.232 Eigenvalues of a 640x640 random matrix______________ (sec): 0.823333333333333 Determinant of a 2500x2500 random matrix____________ (sec): 1.752 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.417 Inverse of a 1600x1600 random matrix________________ (sec): 1.33833333333334 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.32693082905282 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.28600000000001 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 1.00833333333334 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.82266666666666 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.40533333333334 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 1.91199999999998 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.48790723568791 Total time for all 15 tests_________________________ (sec): 25.7516666666667 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.64469699141649 --- End of test --- -- Paul E. Johnson Professor, Political Science Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://crmda.ku.edu
David Smith
2015-Nov-23 17:39 UTC
[Rd] MKL Acceleration encouraging; need adjust package builds?
Hi Paul, We've been through this process ourselves for the Revolution R Open project. There are a number of pitfalls to avoid, but you can take a look at how we achieved it in the build scripts at: https://github.com/RevolutionAnalytics/RRO There are also some very useful notes in the R Installation guide: https://cran.r-project.org/doc/manuals/r-release/R-admin.html#BLAS Most packages do benefit from MKL (or any multi-threaded BLAS) to some degree, although the actual benefit depends on the R functions they call. Some packages (and some built-in R functions) don't call into BLAS endpoints, so you won't see benefits in all cases. # David Smith -- David M Smith <davidsmi at microsoft.com> R Community Lead, Revolution Analytics (a Microsoft company)? Tel: +1 (312) 9205766 (Chicago IL, USA) Twitter: @revodavid | Blog: ?http://blog.revolutionanalytics.com We are hiring engineers for Revolution R and Azure Machine Learning. -----Original Message----- From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Paul Johnson Sent: Monday, November 23, 2015 09:28 To: R Devel List <r-devel at r-project.org> Subject: [Rd] MKL Acceleration encouraging; need adjust package builds? Dear R-devel: The Cluster administrators at KU got enthusiastic about testing R-3.2.2 with Intel MKL when I asked for some BLAS integration. Below I forward a performance report, which is encouraging, and thought you would like to know the numbers. Appears to my untrained eye there are some extraordinary speedups on Cholesky decomposition, determinants, and matrix inversion. They had difficulty getting R to compile with R shared BLAS (don't know what went wrong there), so they went the other direction. In his message to me, the technician says that I should consider adjusting the compilation flags on the packages that use BLAS. Do you think that is needed? R is compiled with non-shared BLAS libraries, won't packages know where to look for BLAS headers? 2. If I need to do that, I wonder how to do it and which packages need attention. Eigen and Armadillo packages, and possibly the ones that depend on them, lme4, anything flowing through Rcpp. Here's the build for some packages. Are they finding MKL BLAS? How would I know? * installing *source* package 'RcppArmadillo' ... ** package 'RcppArmadillo' successfully unpacked and MD5 sums checked * checking LAPACK_LIBS: divide-and-conquer complex SVD available via system LAPACK ** libs g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c RcppArmadillo.cpp -o RcppArmadillo.o g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c RcppExports.cpp -o RcppExports.o g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c fastLm.cpp -o fastLm.o g++ -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -L/usr/local/lib64 -o https://na01.safelinks.protection.outlook.com/?url=RcppArmadillo.so&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=AwdY1xC74H25uBIyciugr9HeuGhYhnDGKoQkeDUhpeQ%3d RcppArmadillo.o RcppExports.o fastLm.o -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR installing to /panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/RcppArmadillo/libs ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded * DONE (RcppArmadillo) * installing *source* package 'RcppEigen' ... ** package 'RcppEigen' successfully unpacked and MD5 sums checked ** libs g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c RcppEigen.cpp -o RcppEigen.o g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c RcppExports.cpp -o RcppExports.o g++ -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -I"/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/Rcpp/include" -I../inst/include -fpic -O3 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c fastLm.cpp -o fastLm.o g++ -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -L/usr/local/lib64 -o https://na01.safelinks.protection.outlook.com/?url=RcppEigen.so&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=JKBcv7cUulJ07Du2ksIqghjWlkEkg%2b8RbNL64cvvYus%3d RcppEigen.o RcppExports.o fastLm.o -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR installing to /panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/RcppEigen/libs ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded * DONE (RcppEigen) * installing *source* package 'MatrixModels' ... ** package 'MatrixModels' successfully unpacked and MD5 sums checked ** R ** preparing package for lazy loading Creating a generic function for 'resid' from package 'stats' in package 'MatrixModels' Creating a generic function for 'fitted.values' from package 'stats' in package 'MatrixModels' Creating a generic function for 'coefficients' from package 'stats' in package 'MatrixModels' Creating a generic function for 'formula' from package 'stats' in package 'MatrixModels' Creating a generic function for 'coef' from package 'stats' in package 'MatrixModels' Creating a generic function for 'fitted' from package 'stats' in package 'MatrixModels' Creating a generic function for 'residuals' from package 'stats' in package 'MatrixModels' ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (MatrixModels) * installing *source* package 'quantreg' ... ** package 'quantreg' successfully unpacked and MD5 sums checked ** libs gfortran -fpic -g -O2 -c akj.f -o akj.o gfortran -fpic -g -O2 -c boot.f -o boot.o gfortran -fpic -g -O2 -c brute.f -o brute.o gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -fpic -I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include -L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64 -c chlfct.c -o chlfct.o gfortran -fpic -g -O2 -c cholesky.f -o cholesky.o gfortran -fpic -g -O2 -c combos.f -o combos.o gfortran -fpic -g -O2 -c crq.f -o crq.o gfortran -fpic -g -O2 -c crqfnb.f -o crqfnb.o gfortran -fpic -g -O2 -c dsel05.f -o dsel05.o gfortran -fpic -g -O2 -c etime.f -o etime.o gfortran -fpic -g -O2 -c extract.f -o extract.o gfortran -fpic -g -O2 -c idmin.f -o idmin.o gfortran -fpic -g -O2 -c iswap.f -o iswap.o gfortran -fpic -g -O2 -c kuantile.f -o kuantile.o gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -fpic -I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include -L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64 -c mcmb.c -o mcmb.o gfortran -fpic -g -O2 -c penalty.f -o penalty.o gfortran -fpic -g -O2 -c powell.f -o powell.o gfortran -fpic -g -O2 -c rls.f -o rls.o gfortran -fpic -g -O2 -c rq0.f -o rq0.o gfortran -fpic -g -O2 -c rq1.f -o rq1.o gfortran -fpic -g -O2 -c rqbr.f -o rqbr.o gfortran -fpic -g -O2 -c rqfn.f -o rqfn.o gfortran -fpic -g -O2 -c rqfnb.f -o rqfnb.o gfortran -fpic -g -O2 -c rqfnc.f -o rqfnc.o gfortran -fpic -g -O2 -c rqs.f -o rqs.o gfortran -fpic -g -O2 -c sparskit2.f -o sparskit2.o gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -fpic -I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include -L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64 -c srqfn.c -o srqfn.o gcc -std=gnu99 -I/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/include -I/usr/local/include -fpic -I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include -L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64 -c srqfnc.c -o srqfnc.o gfortran -fpic -g -O2 -c srtpai.f -o srtpai.o gcc -std=gnu99 -shared -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -L/usr/local/lib64 -o https://na01.safelinks.protection.outlook.com/?url=quantreg.so&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=jwhQtiHxfZFerLI515tW7VRYIEGuxOrLIKktxR4KOlY%3d akj.o boot.o brute.o chlfct.o cholesky.o combos.o crq.o crqfnb.o dsel05.o etime.o extract.o idmin.o iswap.o kuantile.o mcmb.o penalty.o powell.o rls.o rq0.o rq1.o rqbr.o rqfn.o rqfnb.o rqfnc.o rqs.o sparskit2.o srqfn.o srqfnc.o srtpai.o -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm -lgfortran -lm -lgfortran -lm -L/tools/cluster/6.2/R/3.2.2_mkl/lib64/R/lib -lR installing to /panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.2/site-library/quantreg/libs ** R ** data ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded * DONE (quantreg) pj Hi PJ, We're still running the benchmarks to quantify the performance increase. The R benchmarks for the MKL version are promising. The performance increase is varied from test to test, but there isn't any degradation in performance by using the MKL version. You can expect a 2x to 10x performance increase depending on the matrix calculations you are performing. Here are the compilation arguments we used for compiling R with MKL: --disable-BLAS-shlib --with-blas="-L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -W l,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm" --with-lapack You may want to include these while recompiling R packages which use BLAS. Here are the results of the benchmark for the standard R 3.2.2: R Benchmark 2.5 ==============Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 2.69466666666667 2400x2400 normal distributed random matrix ^1000____ (sec): 1.42433333333333 Sorting of 7,000,000 random values__________________ (sec): 2.34466666666667 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 33.187 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 14.52 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 4.51008013606039 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 1.203 Eigenvalues of a 640x640 random matrix______________ (sec): 1.60599999999999 Determinant of a 2500x2500 random matrix____________ (sec): 7.64266666666667 Cholesky decomposition of a 3000x3000 matrix________ (sec): 8.05900000000001 Inverse of a 1600x1600 random matrix________________ (sec): 8.64166666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 4.62477425061321 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.25633333333335 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.894999999999982 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.714 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.4013333333333 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 2.041 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.44505946077978 Total time for all 15 tests_________________________ (sec): 88.6306666666667 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 3.11209972260597 --- End of test --- Here are the results for the MKL version: R Benchmark 2.5 ==============Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 2.88466666666667 2400x2400 normal distributed random matrix ^1000____ (sec): 1.45933333333333 Sorting of 7,000,000 random values__________________ (sec): 2.35166666666667 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 3.37233333333333 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 1.68666666666666 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 2.25337542617509 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 1.232 Eigenvalues of a 640x640 random matrix______________ (sec): 0.823333333333333 Determinant of a 2500x2500 random matrix____________ (sec): 1.752 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.417 Inverse of a 1600x1600 random matrix________________ (sec): 1.33833333333334 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.32693082905282 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.28600000000001 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 1.00833333333334 Grand common divisors of 400,000 pairs (recursion)__ (sec): 1.82266666666666 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.40533333333334 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 1.91199999999998 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.48790723568791 Total time for all 15 tests_________________________ (sec): 25.7516666666667 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.64469699141649 --- End of test --- -- Paul E. Johnson Professor, Political Science Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fpj.freefaculty.org&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=OQn3ZG5CWA3HRew7kSXouwHTARsGXFvzHHUoicoo%2fBA%3d https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fcrmda.ku.edu&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=uCFPVsWJzHYMKd6kWq33qFkOXvj4H51zcEEBcOdvxyI%3d ______________________________________________ R-devel at r-project.org mailing list https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fstat.ethz.ch%2fmailman%2flistinfo%2fr-devel&data=01%7c01%7cdavidsmi%40microsoft.com%7c80ae9ec8fef04c42eed808d2f42bf31d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=YFcT64Zhp8Qi1MMSh%2bhiLESj7t4kTfSp8CYoYtRp2LM%3d
Dirk Eddelbuettel
2015-Nov-23 17:58 UTC
[Rd] MKL Acceleration encouraging; need adjust package builds?
We said it before, but it bears repeating: BLAS is an interface. So unless you use on a static library build, these library can be switch after compilation and at essentially any point in time. My (unfinished) package gcbd shows how in its simple and vignette by comparing a number of BLAS implementations. See the (now dated) chart on page 9 of https://cran.rstudio.com/web/packages/gcbd/vignettes/gcbd.pdf or this (old) blog post http://dirk.eddelbuettel.com/blog/2010/10/03/ While the charts could do with an update, they do show how eg reference blas is clearly outperformed by Atlas or GotoBLAS (the predecessor to OpebBLAS). Hope this helps, Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Paul Johnson
2015-Nov-25 16:51 UTC
[Rd] MKL Acceleration encouraging; need adjust package builds?
On Mon, Nov 23, 2015 at 11:39 AM, David Smith <davidsmi at microsoft.com> wrote:> Hi Paul, > > We've been through this process ourselves for the Revolution R Open project. There are a number of pitfalls to avoid, but you can take a look at how we achieved it in the build scripts at: > > https://github.com/RevolutionAnalytics/RRO > > There are also some very useful notes in the R Installation guide: > https://cran.r-project.org/doc/manuals/r-release/R-admin.html#BLAS > > Most packages do benefit from MKL (or any multi-threaded BLAS) to some degree, although the actual benefit depends on the R functions they call. Some packages (and some built-in R functions) don't call into BLAS endpoints, so you won't see benefits in all cases. > > # David Smith >Dear David I'm in the situation mentioned here in the docs, since BLAS is not shared. "Note that under Unix (but not under Windows) if R is compiled against a non-default BLAS and --enable-BLAS-shlib is not used, then all BLAS-using packages must also be. So if R is re-built to use an enhanced BLAS then packages such as quantreg will need to be re-installed. " I am building all of the modules from scratch, so if the default build is sufficient, then I'll be done. When I asked the other day, I was worried that packages would find the wrong shared library. As far as I can tell now, I should not have been so worried. Today, while browsing the R installation, I find the Makeconf file and that has all the information a package should need. I've verified that the quantreg package detects this information, and we'll just hope the others do too :) In case anybody else comes along later and wonders how R can be configured to make this go, here's the top of our Makeconf from the installed R, which has the configure line as well as BLAS_LIBS, which, so far as I can tell, is making all of this go. Makeconf content # etc/Makeconf. Generated from Makeconf.in by configure. # # ${R_HOME}/etc/Makeconf # R was configured using the following call # (not including env. vars and site configuration) # configure '--prefix=/tools/cluster/6.2/R/3.2.2_mkl' '--with-tcltk' '--enable-R-shlib' '--enable-shared' '--with-pic' '--disable-BLAS-shlib' '--with-blas=-L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm' '--with-lapack' 'CFLAGS=-I/panfs/pfs.acf.ku.edu/cluster/system/pkg/R/curl7.45_install/include -L/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.2.2_mkl/lib64' 'JAVA_HOME=/tools/cluster/6.2/java/jdk1.8.0_66' ## This fails if it contains spaces, or if it is quoted include $(R_SHARE_DIR)/make/vars.mk AR = ar ## Used by packages 'maps' and 'mapdata' AWK = gawk BLAS_LIBS = -L/panfs/pfs.acf.ku.edu/cluster/6.2/intel/2015/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_gf_lp64 -Wl,--start-group -lmkl_gnu_thread -lmkl_core -Wl,--end-group -fopenmp -ldl -lpthread -lm C_VISIBILITY = -fvisibility=hidden ... pj -- Paul E. Johnson Professor, Political Science Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://crmda.ku.edu