thr3ads.net - R help - [R] Change in 'solve' for r-patched [Oct 2003]

If this information is useful, please help other people find it:
Share via:

Douglas Bates

2003-Oct-30 19:01 UTC

[R] Change in 'solve' for r-patched

The solve function in r-patched has been changed so that it applies a
tolerance when using Lapack routines to calculate the inverse of a
matrix or to solve a system of linear equations.  A tolerance has
always been used with the Linpack routines but not with the Lapack
routines in versions 1.7.x and 1.8.0.  (You can use the optional
argument tol = 0 to override this check for computational singularity
but you probably want to ask yourself why before doing so.  In a
perfect world you would be required to pass a qualifier exam before
being allowed to do that. :-)

We found that making such a change with a not-unreasonable default
value for the tolerance caused several packages to fail R CMD check.
As a result we have changed to a very generous default tolerance and,
by default, declare the matrix to be computationally singular if the
estimated reciprocal condition number is less than
.Machine$double.eps.  In future versions of R we may make that
tolerance tighter.

A quick glance at the R CMD check failures resulting from the earlier
change shows that in most cases the calculation involved was something
like 
  solve(t(X) %*% X)

This is not good numerical linear algebra.  Admittedly we all learn
that, for example, least squares estimates can be written as
(X'X)^{-1} X'y and it is seems reasonable to code this calculation as 
  betahat <- solve(t(X) %*% X) %*% t(X) %*% y
but you shouldn't do that.

In terms of numerical stability and also in terms of efficiency, that
is an remarkably bad way of determining least squares estimates.

To begin with, you don't invert a matrix just to solve a system of
equations.  If A is an n by n matrix and y is an n vector, then

 solve(A)

is the equivalent of performing

 solve(A, y)

n times.  Thus solve(A) %*% y is more than n times as expensive as
solve(A, y).  If you want A^{-1}y then write it as solve(A, y).

(In a perfect world you would be required to fill out a form, in
triplicate, explaining in detail exactly why there is no suitable
alternative to calculating the inverse before being allowed to use
solve(A).)

So, how should you calculate least squares estimates?  We recommend

 betahat <- qr.coef(qr(X), y)

If you want other results from the least squares calculation, such as
fitted values or residuals, you may want to save qr(X) so you can reuse it

 qrX <- qr(X)
 betahat <- qr.coef(qrX, y)
 res <- qr.resid(qrX, y)
 ...

There are alternatives but
 solve(t(X) %*% X) %*% t(X) %*% y
is never a good one.  Seber and Lee discuss discuss such calculations
at length in chapter 11 of their "Linear Regression Analysis (2nd ed)"
(Wiley, 2003).

Some other comments:

 - the condition number of X'X is the square of the condition number
   of X, which is why it is a good idea to avoid working with X'X
   whenever possible 
 - on those rare occasions when you do want (X'X)^{-1}, say to form
   a variance-covariance matrix, you should evaluate it as
    chol2inv(qr.R(qr(X)))
 - if, despite all of the above, you feel you must calculate X'X it is
   best done as
    crossprod(X)
   and not as
    t(X) %*% X
   Similarly, X'y is best calculated as crossprod(X, y).  In general
   you should use crossprod in place of "t(anything) %*% anythingelse"

Douglas Bates

2003-Oct-30 19:59 UTC

head link

[R] Change in 'solve' for r-patched

Douglas Bates <bates at cs.wisc.edu> writes:

...> To begin with, you don't invert a matrix just to solve a system of
> equations.  If A is an n by n matrix and y is an n vector, then
> 
>  solve(A)
> 
> is the equivalent of performing
> 
>  solve(A, y)
> 
> n times.  Thus solve(A) %*% y is more than n times as expensive as
> solve(A, y).  If you want A^{-1}y then write it as solve(A, y).
I overstated the comparison between solve(A, y) and solve(A) %*% y.
For both solve(A) and solve(A, y), the first step is to calculate
an LU decomposition of A, and the second step is to solve the linear
system(s) of equations using this decomposition.  It is only the second
step that takes n times as long for solve(A) than it does for solve(A, y)
(when y is a vector).  

The first step, which will account for the majority of the time spent
executing solve(A, y), is the same as in solve(A).

Ole F. Christensen

2003-Oct-31 16:36 UTC

head link

[R] Change in 'solve' for r-patched

Dear R help

Thanks to Professor Bates for his information about how to calculate 
least square estimates [not using solve] in ``the right way''.
This is very useful indead, I am clearly one of of the package 
maintainers who is not using using solve in a proper way at the moment.
However, the calculations in my code look more like GLS than LS.

## GLS could in principlpe be implemented like this :
betahat <- solve(t(X) %*% solve(Omega)%*% X) %*% t(X)%*%solve(Omega)%*% y
## where Omega is a strictly p.d. symmetric matrix

Does someone have a recommendation on how to do this in ``the right
way'' ?

My first attempt (trying to imitate the LS solution recommended by Prof. Bates)
is :

temp <- backsolve(chol(Omega),cbind(X,y))
betahat <- qr.coef(qr(temp[,1:ncol(X)]), temp[,ncol(X)+1])



Thank you in advance for any help


Cheers Ole

-- 
Ole F. Christensen
Center for Bioinformatik
Datalogisk Institut
Aarhus Universitet
Ny Munkegade, Bygning 540
8000 Aarhus C
Denmark

Williams, Elliot - BLS

2003-Oct-31 19:50 UTC

head link

[R] Change in 'solve' for r-patched

Hi all,

Thanks Douglas for bringing up numerical stability and OLS regression
coefficients.  I'm often worried about the speed of regression runs, as I
sometimes run large simulations or resamplings.  Inspired, and prone to run
many regressions, I did a little simulation as follows.  XP results are on a
2.4Ghz Pentium 4 in Windows from the binaries (without BLAS?) and Linux
results are on a 1.5Ghz Pentium M laptop with a Pentium Atlas BLAS.

Setting up the data: x is rnormals with a vector of ones prepended...
> x <- cbind(rep(1,100), matrix(rnorm(1000), nc=10))
> beta0 <- runif(11)
> y <- x %*% beta0 + rnorm(100)
Estimating Slope Coefficients:
> system.time(for (i in 1:1000) lm( y~ x -1))XP:  [1] 5.91 0.00 5.90   NA   NA
Linux: [1] 5.27 0.01 5.28 0.00 0.00
> system.time(for (i in 1:1000) solve(t(x) %*% x) %*% t(x) %*% y)XP: [1] 0.64 0.01 0.65   NA   NA
Linux: [1] 0.51 0.01 0.57 0.00 0.00
> system.time(for (i in 1:1000) qr.coef(qr(x), y) )XP: [1] 0.75 0.00 0.75   NA   NA
Linux: [1] 0.76 0.00 0.77 0.00 0.00
> system.time(for (i in 1:1000) solve(crossprod(x), crossprod(x,y)))XP: [1] 0.45 0.00 0.53   NA   NA
Linux: [1] 0.35 0.00 0.36 0.00 0.00

On both platforms, BLAS or not, the solve(crossprod()) method works the
fastest, with the na?ve solve(t(x)%*%x) second-fastest.  

Calculating (t(x)%*%x)^-1:
> system.time(for (i in 1:1000) chol2inv(qr.R(qr(x))))XP: [1] 0.44 0.00 0.44   NA   NA
Linux: [1] 0.40 0.00 0.40 0.00 0.00
> system.time(for (i in 1:1000) solve(crossprod(x)) )XP: [1] 0.58 0.01 0.59   NA   NA
Linux: [1] 0.34 0.00 0.34 0.00 0.00

Where the chol2inv() method was faster in Windows but slower in Linux, which
I'm guessing is due to differential uses of BLAS functions.

None of these address the problem of numerical accuracy, which was the
thrust of Doug's comments.  Has anyone done a quick simulation to
investigate the stability of the solutions?  Is it small coefficients or
near-collinearity (or both) that one has to worry about?  

Are the solve(crossprod()) methods obviously unstable?  They surely do work
quickly!

Thanks again R-universe.  I've needed no other statistical software for the
last 2 years.  I hope to get around to contributing a package or two soon.

						Elliot.

Reasonably Related Threads

Search for more maybe matching threads

R help - Oct 2003 - Change in 'solve' for r-patched

[R] Change in 'solve' for r-patched

[R] Change in 'solve' for r-patched

[R] Change in 'solve' for r-patched

[R] Change in 'solve' for r-patched

Reasonably Related Threads