Let's assume that the columns of the model matrix, apart perhaps
from an initial column that corresponds to the overall mean, have
been centred. Then:
1) Normal equation methods give an accurate fit to the matrix
of centred sums of squares and products.
2) QR methods give an accurate fit to the predicted values.
QR will give better precision than normal equation methods
(e.g., Cholesky) if there are substantial correlations between
the columns of the model matrix. This is because sequential
normal equations methods successively modify the centred
sums of squares and products (CSSP) matrix to be a
representation of the matrix of sums of squares and products
of partial residuals as columns of the model matrix are partialed
out in turn. QR directly modifies a representation of the partial
residuals themselves.
If columns of the model matrix are almost uncorrelated then
normal equation methods may however give the better precision,
essentially because the CSSP matrix does not change much and
normal equation methods require fewer arithmetic operations.
In the situations where QR gives substantially better precision,
the correlations between columns of the model matrix will mean
that some regression coefficients have a large standard error.
The variance inflation factor for some regression coefficients
will be large. Will the additional precision be meaningful?
The question has especial point now that double precision
is standardly used.
I think it useful to expose students to both classes of method.
In contexts where QR gives results that are numerically
more precise, I'd encourage them to examine the variance
inflation factors (they should examine them anyway). It is
often a good idea, if the VIFs are large, to consider whether
there is a simple re-parameterization [perhaps as simple
as replacing x1 and x2 by (x1+x2) and (x1-x2)] where
correlations create less havoc. This may be an important
issue for interpretation, even if the increased numerical
accuracy serves no useful purpose.
------------------------------------------------------------------------
------------------------------> Date: Mon, 24 Feb 2003 13:50:31 -0500
> From: Chong Gu <chong at stat.purdue.edu>
>
>
> Not only it's unfair criticism, it's probably also imprecise
> information.
>
> For a detailed discussion of the precisions of regression estimates
> through QR-decomposition and normal equations, one may consult Golub
> and Van Loan's book on Matrix Computation (1989, Section 5.3.9 on page
> 230). QR takes twice as much computation, requires more memory, but
> does NOT necessarily provide better precision.
>
> The above said, I am not questioning the adequacy of the QR approach
> to regression calculation as implemented in R.
>
>>
>> That's an unfair criticism. That discussion was never intended as
>> a recommendation for how to compute a regression. Of course, SVD or
>> QR decompositions are the preferred method but many newbies don't
>> want to
>> digest all that right from the start. These are just obscure details
>> to
>> the beginner.
>>
>> One of the strengths of R in teaching is that students can directly
>> implement the formulae from the theory. This reinforces the connection
>> between theory and practice. Implementing the normal equations
>> directly
>> is a quick early illustration of this connection. Explaining the
>> precise
>> details of how to fit a regression model is something that can be
>> deferred.
>>
>> Julian Faraway
>>
>>>> I am just about working through Faraways excellent tutorial
>>>> "practical
>>>> regression and ANOVA using R"
>>>
>>> I assume this is a reference to the PDF version available via CRAN.
>>> I am
>>> afraid that is *not* a good discussion of how to do regression,
>> especially
>>> not using R. That page is seriously misleading: good ways to
compute
>>> regressions are QR decompositions with pivoting (which R uses) or
an
>>> SVD.
>>> Solving the normal equations is well known to square the condition
>> number,
>>> and is close to the worse possible way. (If you must use normal
>>> equations, do at least centre the columns, and preferably do some
>>> scaling.)
>>>
>>>> on page 24 he makes the x matrix:
>>>> x <- cbind(1,gala[,-c(1,2)])
>>>>
>>>> how can I understand this gala[,-c(1,2)])... I couldn't
find an
>>>> explanation of such "c-like" abbreviations anywhere.
>>>
>>> Well, it is in all good books (as they say) including `An
>>> Introduction to
>>> R'. (It's even on page 210 of that book!)
>>>
>>> -c(1,2) is (try it)
>>>
>>>> -c(1,2)
>>> [1] -1 -2
>>>
>>> so this drops columns 1 and 2. It then adds in front a column made
>>> up of
>>> ones, which is usually a sign of someone not really understanding
how
>>> R's linear models work.
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.