thr3ads.net - R help - [R] lm handling of ill-conditioned systems [Mar 2000]

If this information is useful, please help other people find it:
Share via:

Tomislav Goles

2000-Mar-20 14:45 UTC

[R] lm handling of ill-conditioned systems

The lm() function in R seems to handle the inversion of singular X'X
matrices
(where there is collinearity between regression inputs) in a way
where one of the inputs is dropped and this also seems to be the
default behavior in SAS (please let me know if i'm wrong about this).
In some other packages (i.e. octave ols() function) the pseudo 
inverse is computed where singular values less then some small threshold are 
not included in computation of inverse but the data columns themselves are not 
dropped).
I am wandering if few experts could comment on what is a better (best?) way to 
handle singular matrices in this context or to perhaps point me to some 
literature on this.
Thanks,
Tomislav Goles
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

2000-Mar-20 17:15 UTC

head link

[R] lm handling of ill-conditioned systems

On 20 Mar 2000, Tomislav Goles wrote:
> 
> The lm() function in R seems to handle the inversion of singular X'X
matrices
> (where there is collinearity between regression inputs) in a way
> where one of the inputs is dropped and this also seems to be the
> default behavior in SAS (please let me know if i'm wrong about this).
> In some other packages (i.e. octave ols() function) the pseudo 
> inverse is computed where singular values less then some small threshold
are
> not included in computation of inverse but the data columns themselves are
not
> dropped).
> I am wandering if few experts could comment on what is a better (best?) way
to
> handle singular matrices in this context or to perhaps point me to some 
> literature on this.

When the design matrix is not of full rank there is no information in the
data about certain linear combinations of the variables.   This means that
there isn't unique least squares or maximum likelihood estimate for the
parameters. This isn't a numerical analysis issue, it's a statistical
one.

While it is possible to pick out a single estimate using some arbitrary
criterion, it is not possible to construct a useful covariance matrix for
these parameters. For example, if a regression has two identical
predictors the variance of each parameter estimators and their covariance
will be (+/-)infinite, even though the sum of the estimators really has a
sensible finite variance.

For this reason you don't gain much, if anything, by allowing least
squares estimation of all the parameters.  It's also more work -- I
believe the algorithm involves the singular value decomposition. People
who want to estimate non-identifiable parameters can work it out
themselves.

In S/R and in Stata (and possibly other statistics packages) variables are
dropped to create a submodel that can be estimated properly.  In S/R the
remaining coefficients are set to NA to emphasize that they are completely
unknown.  I don't know if this was orginally a design intention or
whether it was the simplest way to implement the computations, but it
makes as much sense as anything else and forces the user to notice that
the model was unidentifiable.

As a technical note, R does not in fact invert X'X to fit a linear model.
We find the QR decomposition of X and solve Rbeta=Q'Y.  This results in
roughly half as many digits of rounding error as if we had inverted X'X at
a relatively modest additional computing cost.  The tolerance we currently
use in lm means that a column is discarded if it's diagonal element of R
would be 10^-7 times smaller than the largest one.  I think this
translates into a variance inflation factor of about 10^14, so when we
say 'singular' we mean it.

	-thomas

Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

john.maindonald@anu.edu.au

2000-Mar-20 23:08 UTC

head link

[R] lm handling of ill-conditioned systems

At 09:15 AM 20/03/00 -0800, Thomas Lumley wrote:>On 20 Mar 2000, Tomislav Goles wrote:
>> The lm() function in R seems to handle the inversion of singular
X'X
>> matrices (where there is collinearity between regression inputs) in a
>> way where one of the inputs is dropped and this also seems to be the
>> default behavior in SAS (please let me know if i'm wrong about
this).
>> In some other packages (i.e. octave ols() function) the pseudo 
>> inverse is computed where singular values less then some small 
>> threshold are not included in computation of inverse but the data 
>> columns themselves are not dropped).
>> I am wandering if few experts could comment on what is a better 
>> (best?) way to handle singular matrices in this context or to perhaps 
>> point me to some literature on this.
>
>
>When the design matrix is not of full rank there is no information in the
>data about certain linear combinations of the variables.   This means that
>there isn't unique least squares or maximum likelihood estimate for the
>parameters. This isn't a numerical analysis issue, it's a
statistical one.
>
>While it is possible to pick out a single estimate using some arbitrary
>criterion, it is not possible to construct a useful covariance matrix for
>these parameters. 
It is worth re-iterating that the generalised inverse form of solution
is an arbitrary resolution of the lack of uniqueness.
If a Moore-Penrose generalised inverse is used.  then the constraint
that is applied is that the sum of squares of the coefficients should
be as small as possible.  It would be much preferable to call this the
minimum length solution.  There are other ways to get this solution
than by forming a Moore-Penrose inverse.

Note also that there are important components of the calculation that
are uniquely defined.  There is a unique set of predicted values,
residuals, and residual sum of squares.


and residuals are uniquely 
John Maindonald               email : john.maindonald at anu.edu.au        
Statistical Consulting Unit,  phone : (6249)3998        
c/o CMA, SMS,                 fax   : (6249)5549  
John Dedman Mathematical Sciences Building
Australian National University
Canberra ACT 0200
Australia

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

gb@stat.umu.se

2000-Mar-21 09:34 UTC

head link

[R] dyn.load and rw1000

How do I compile a Fortran subroutine or a C function so that it can be
dyn.loaded in the windows version of R-1.0.0? I use Cygnus C and Fortran 
compilers (and mingw-something). 

G?ran
----------------------------------------------------------
 G?ran Brostr?m                      tel: +46 90 786-5223
 Department of Statistics            fax: +46 90 786-6614
 Ume? University                                         
 SE-90187 Ume?, Sweden              email: gb at stat.umu.se
                                                        
 http://www.stat.umu.se/egna/gb    ftp://capa.stat.umu.se
----------------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Apparently Analagous Threads

Search for more maybe matching threads

R help - Mar 2000 - lm handling of ill-conditioned systems

[R] lm handling of ill-conditioned systems

[R] lm handling of ill-conditioned systems

[R] lm handling of ill-conditioned systems

[R] dyn.load and rw1000

Apparently Analagous Threads