I cleaned up my old benchmarking code and added checks for missing data to compare various ways of finding OLS regression coefficients. I thought I would share this for others. the long and short of it is that I would recommend ols.crossprod = function (y, x) { x <- as.matrix(x) ok <- (!is.na(y))&(!is.na(rowSums(x))) y <- y[ok]; x <- subset(x, ok) x <- cbind( 1, x) XtX <- crossprod(x) Xty <- crossprod(x, y) solve(XtX, Xty) } for fast and stable coefficients. (yes, stable using double precision, even though not as stable as lm(). it works just fine with X variables that have >99.99% correlation with as few as 100 observations. if your situation is worse than this, you probably have an error in your data---or you are looking for the Higgs Boson.) I added the code below. feel free to ignore. /iaw ################################################################ ### ### code to test alternatives how fast OLS coefficients can be obtained. ### including tests to exclude missing observations where necessary. ### ### for a more informed article (and person), see Bates, 'Least Squares ### Calculations in R', Rnews 2004-1. the code here does not test his sparse ### matrix examples, or his geMatrix/poMatrix examples. ### ### Basic Results: for the examples that I tried, typical relative time ### factors of the algorithms were about ### ### lm lmfit solve crossprod cholesky (special-case 2vars) ### 1.0 0.5 0.3 0.15 0.17 0.1 ### ### there was no advantage to cholesky, so you may as well use the simpler ### crossprod. ### ### I was also interested in algorithm scaling N and K. yes, there were ### some changes in the factors across algorithms, but the general pattern ### wasn't too different. for the cholesky decomposition, ### ### N=1000 N=10000 N=100000 N=200000 ### K=1 1.0 7 80 160 ### K=10 2.5 26 ### K=50 16 ### K=100 43 70 ### K=200 140 ### ### some of this may well be swap/memory access, etc. roughly speaking, we ### scale ten-times N takes twice as long. 10 and ten-times K takes 25 times ### as long. ### ### of course, ols.crossprod and ols.cholesky are not as stable as ols.lm, ### but they are still amazingly stable, given the default double precision. ### even with a correlation of 0.999999(!) between the two final columns, ### they still produce exactly the same result as ols.lm with 1000 ### observations. frankly, the ill-conditioning worry is overblown with ### most real-world data. if you really have data THIS bad, you should ### already know it; and you probably just have some measurement errors in ### your observations, and your regression is giving you garbage either way. ### ### if I made the R core decision, I would switch away from lm()'s default ### method, and make it a special option. my guess is that it is status-quo ### bias that keeps the current method. or, at least I would say loudly in ### the R docs that for common use, here is a much faster method... ### ################################################################ MC <- 100 N <- 1000 K <- 10 SD <- 1e-3 ols <- list( ols.lm = function (y, x) { coef(lm(y ~ x)) }, ols.lmfit = function (y, x) { x <- as.matrix(x) ok <- (!is.na(y))&(!is.na(rowSums(x))) y <- y[ok]; x <- subset(x, ok) x <- as.matrix(cbind( 1, x)) lm.fit(x, y)$coefficients }, ols.solve = function (y, x) { x <- as.matrix(x) ok <- (!is.na(y))&(!is.na(rowSums(x))) y <- y[ok]; x <- subset(x, ok) x <- cbind(1, x) xy <- t(x)%*%y xxi <- solve(t(x)%*%x) b <- as.vector(xxi%*%xy) b }, ols.crossprod = function (y, x) { x <- as.matrix(x) ok <- (!is.na(y))&(!is.na(rowSums(x))) y <- y[ok]; x <- subset(x, ok) x <- cbind( 1, x) XtX <- crossprod(x) Xty <- crossprod(x, y) solve(XtX, Xty) }, ols.cholesky = function (y, x) { x <- as.matrix(x) ok <- (!is.na(y))&(!is.na(rowSums(x))) y <- y[ok]; x <- subset(x, ok) x <- cbind( 1, x) ch <- chol( crossprod(x) ) backsolve(ch, forwardsolve(ch, crossprod(x,y), upper=TRUE, trans=TRUE)) } ) set.seed(0) y <- matrix(rnorm(N*MC), N, MC) x <- array(rnorm(MC*K*N), c(N, K, MC)) cat("N=", N, "K=", K, " (MC=", MC, ")") if (K>1) { sum.cor <- 0 for (mc in 1:MC) { x[,K,mc] <- x[,K-1,mc]+rnorm(N, sd=SD) sum.cor <- sum.cor + cor(x[,K,mc], x[,K-1,mc], use="pair") } options(digit=10) cat( " sd=", SD, "The bad corr= 1+", sum.cor/MC-1 ) } else { ols$ols.xy.Kis1 <- function(y, x) { ok <- (!is.na(y))&(!is.na(x)) y <- y[ok]; x <- x[ok] b <- cov(y,x)/var(x) c(mean(y) - b*mean(x), b) } } cat("\n") ## make sure we do the NaN checks x[1,1,1] <- NA y[2,2] <- NA options(digits=2) time.factor.first <- first.checksum <- NA for (ctr in 1:length(ols)) { invisible({gc(); gc(); gc()}) coefsum <- 0.0 v <- system.time(for (mc in 1:MC) coefsum <- coefsum + ols[[ctr]](y[,mc], x[,,mc])) thiscoefsum <- sum(abs(coefsum)) thistime <- sum(as.numeric(v)[1:2])/MC if (is.na(time.factor.first)) time.factor.first <- thistime if (is.na(first.checksum)) first.checksum <- thiscoefsum cat(sprintf("%13.13s", names(ols[ctr])), " Checksum = ", thiscoefsum, "(", sprintf("%.8f", abs(thiscoefsum - first.checksum)/MC), ")", "Time =", sprintf("%.3f", 100*thistime), "msecs. Factor=", thistime/time.factor.first, "\n") }