Stéphane Tufféry
2016-Aug-03 19:37 UTC
[R] Effect of an optimized BLAS library on lm() function
Dear All, My question is simple (but the answer perhaps less): how does R lm() function benefit from an optimized BLAS library (such as ATLAS, OpenBLAS, or the MKL library in R Open)? I suppose that these BLAS optimizations concentrate on the level-3 BLAS, and lm() function relies on level-1 BLAS LINPACK? Nevertheless, the following test on my laptop seems to indicate slighter faster calculations with lm() when using MKL library.> set.seed(123)> m <- 100000> n <- 100> matest <- matrix(rnorm(m*n,0,2), m, n)> y <- rnorm(m)> x <- cbind(1, matest)> test <- data.frame(matest, y = y)> n <- names(test)> f <- as.formula(paste("y ~", paste(n[!n %in% "y"], collapse = " + ")))> bigtest <- as.big.matrix(cbind(matest, y))> options(bigmemory.allow.dimnames=TRUE)> colnames(bigtest) <- names(test)> compare <- microbenchmark(lm(y ~ ., data=test), lm.fit(x,y), biglm(f,data=test), + bigglm(f, data=test, family=gaussian()), speedglm(y ~ ., data=test, family=gaussian()), + fastLm(y ~ ., data=test, family=gaussian()), + biglm.big.matrix(f, data=bigtest, chunksize=100000), + solve(crossprod(x), crossprod(x,y)), + solve(t(x)%*%x, t(x)%*%y), + qr.coef(qr(x, LAPACK = T), y), + qr.coef(qr(x, LAPACK = F), y), times=30, unit="ms") # on R base> compareUnit: milliseconds expr min lq mean median uq max neval lm(y ~ ., data = test) 2423.4618 2497.0986 2587.6717 2528.4108 2605.8664 3245.0846 30 lm.fit(x, y) 1785.2536 1816.0466 1864.9683 1841.6295 1868.1580 2166.6797 30 biglm(f, data = test) 1947.5808 2001.2834 2044.3119 2021.8473 2084.5147 2304.7108 30 bigglm(f, data = test, family = gaussian()) 4461.9307 4544.2631 4726.1232 4629.1442 4705.2304 6240.9467 30 speedglm(y ~ ., data = test, family = gaussian()) 1651.4826 1670.5076 1727.1039 1696.9502 1771.4395 1946.8591 30 fastLm(y ~ ., data = test, family = gaussian()) 3939.3955 4051.2525 4214.4684 4149.4943 4301.5698 5037.1337 30 biglm.big.matrix(f, data = bigtest, chunksize = 1e+05) 2176.9082 2283.2670 2336.6303 2345.2608 2383.1442 2478.3213 30 solve(crossprod(x), crossprod(x, y)) 839.3229 844.4637 867.8926 856.9103 882.7036 949.5098 30 solve(t(x) %*% x, t(x) %*% y) 1436.7063 1464.4559 1531.5375 1494.3093 1532.6056 1916.9786 30 qr.coef(qr(x, LAPACK = T), y) 1989.8013 2031.9686 2091.0366 2063.0931 2128.2705 2317.7407 30 qr.coef(qr(x, LAPACK = F), y) 1752.9953 1809.3616 1856.6803 1826.0688 1887.2954 2111.4479 30 # on R Open (2 threads)> getMKLthreads()[1] 2> compareUnit: milliseconds expr min lq mean median uq max neval lm(y ~ ., data = test) 1788.61291 1875.63929 1976.74912 1928.96796 2082.2125 2371.3407 30 lm.fit(x, y) 1097.40423 1194.82479 1256.65020 1263.27286 1299.7869 1451.5524 30 biglm(f, data = test) 1936.05859 1983.08455 2041.44009 2019.54653 2103.5658 2211.2873 30 bigglm(f, data = test, family = gaussian()) 4707.12674 4795.69165 5005.35978 5023.22328 5193.1482 5340.9577 30 speedglm(y ~ ., data = test, family = gaussian()) 881.47945 934.77104 996.59551 975.93888 1038.1470 1212.8197 30 fastLm(y ~ ., data = test, family = gaussian()) 1671.48208 1810.42245 1883.39515 1880.31249 1936.2377 2324.2279 30 biglm.big.matrix(f, data = bigtest, chunksize = 1e+05) 2196.43003 2266.86705 2339.93722 2355.17538 2401.8879 2585.1069 30 solve(crossprod(x), crossprod(x, y)) 63.50803 70.25601 92.00605 74.73306 101.0346 363.4992 30 solve(t(x) %*% x, t(x) %*% y) 269.83961 298.29031 340.11531 320.85419 350.4334 541.5970 30 qr.coef(qr(x, LAPACK = T), y) 864.66990 941.90109 1000.73358 985.70631 1030.5733 1341.8595 30 qr.coef(qr(x, LAPACK = F), y) 1155.82456 1253.15555 1309.30060 1299.67349 1354.4523 1556.7010 30 # on R Open (1 thread)> getMKLthreads()[1] 1> compareUnit: milliseconds expr min lq mean median uq max neval lm(y ~ ., data = test) 1991.4313 2034.4098 2111.0498 2104.9173 2182.6686 2379.2731 30 lm.fit(x, y) 1329.0017 1347.4155 1437.7724 1409.5693 1519.5183 1690.3965 30 biglm(f, data = test) 1922.7935 1979.4146 2030.7355 2008.3577 2066.5151 2265.3957 30 bigglm(f, data = test, family = gaussian()) 4474.7666 4534.6946 4589.0903 4594.6761 4637.1702 4744.8987 30 speedglm(y ~ ., data = test, family = gaussian()) 918.9388 959.5755 1015.1099 1000.7063 1066.6677 1178.5648 30 fastLm(y ~ ., data = test, family = gaussian()) 1737.4001 1785.1347 1908.2960 1905.7776 1992.8247 2188.0665 30 biglm.big.matrix(f, data = bigtest, chunksize = 1e+05) 2172.3621 2242.7618 2309.5848 2317.5166 2362.6432 2434.8259 30 solve(crossprod(x), crossprod(x, y)) 122.5138 125.0064 132.2125 128.8345 137.1886 157.8529 30 solve(t(x) %*% x, t(x) %*% y) 330.3578 336.0520 390.2391 379.9801 413.4324 530.6566 30 qr.coef(qr(x, LAPACK = T), y) 917.1596 943.9504 987.5824 971.1059 1001.2241 1231.8204 30 qr.coef(qr(x, LAPACK = F), y) 1313.7414 1348.3933 1439.4807 1480.5955 1518.4831 1569.0909 30 We notice that solve(crossprod(x), crossprod(x, y)) is faster than solve(t(x) %*% x, t(x) %*% y), because the function crossprod(x) exploits the fact that the matrix t(x) %*% x is symmetric to calculate only half of the matrix. We notice that solve(t(x) %*% x, t(x) %*% y) is faster than qr.coef(qr(x), y), because the function solve() relies on the Cholevsky decomposition, which is approximately twice as fast as the QR decomposition. We notice that qr.coef(qr(x, LAPACK = T), y) is faster than qr.coef(qr(x, LAPACK = F), y), at least with an optimized BLAS library which allows a saving of time with regard to the use of LINPACK. We notice that the calculation time of qr.coef(qr(x, LAPACK = F), y) is of the same order as that of lm.fit() which also uses a QR decomposition based on LINPACK. Of course, the calculation time of lm() is more important than that of lm.fit(), because lm() has to estimate the formula and make additional calculations. We also notice that the functions biglm() and bigglm() of package biglm do not benefit from the optimized library BLAS MKL, but that all other functions benefit from it, including lm() and lm.fit(), and even with only one thread. Sincerely, St?phane [[alternative HTML version deleted]]