The lm() function in R seems to handle the inversion of singular X'X matrices (where there is collinearity between regression inputs) in a way where one of the inputs is dropped and this also seems to be the default behavior in SAS (please let me know if i'm wrong about this). In some other packages (i.e. octave ols() function) the pseudo inverse is computed where singular values less then some small threshold are not included in computation of inverse but the data columns themselves are not dropped). I am wandering if few experts could comment on what is a better (best?) way to handle singular matrices in this context or to perhaps point me to some literature on this. Thanks, Tomislav Goles -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 20 Mar 2000, Tomislav Goles wrote:> > The lm() function in R seems to handle the inversion of singular X'X matrices > (where there is collinearity between regression inputs) in a way > where one of the inputs is dropped and this also seems to be the > default behavior in SAS (please let me know if i'm wrong about this). > In some other packages (i.e. octave ols() function) the pseudo > inverse is computed where singular values less then some small threshold are > not included in computation of inverse but the data columns themselves are not > dropped). > I am wandering if few experts could comment on what is a better (best?) way to > handle singular matrices in this context or to perhaps point me to some > literature on this.When the design matrix is not of full rank there is no information in the data about certain linear combinations of the variables. This means that there isn't unique least squares or maximum likelihood estimate for the parameters. This isn't a numerical analysis issue, it's a statistical one. While it is possible to pick out a single estimate using some arbitrary criterion, it is not possible to construct a useful covariance matrix for these parameters. For example, if a regression has two identical predictors the variance of each parameter estimators and their covariance will be (+/-)infinite, even though the sum of the estimators really has a sensible finite variance. For this reason you don't gain much, if anything, by allowing least squares estimation of all the parameters. It's also more work -- I believe the algorithm involves the singular value decomposition. People who want to estimate non-identifiable parameters can work it out themselves. In S/R and in Stata (and possibly other statistics packages) variables are dropped to create a submodel that can be estimated properly. In S/R the remaining coefficients are set to NA to emphasize that they are completely unknown. I don't know if this was orginally a design intention or whether it was the simplest way to implement the computations, but it makes as much sense as anything else and forces the user to notice that the model was unidentifiable. As a technical note, R does not in fact invert X'X to fit a linear model. We find the QR decomposition of X and solve Rbeta=Q'Y. This results in roughly half as many digits of rounding error as if we had inverted X'X at a relatively modest additional computing cost. The tolerance we currently use in lm means that a column is discarded if it's diagonal element of R would be 10^-7 times smaller than the largest one. I think this translates into a variance inflation factor of about 10^14, so when we say 'singular' we mean it. -thomas Thomas Lumley Assistant Professor, Biostatistics University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
john.maindonald@anu.edu.au
2000-Mar-20 23:08 UTC
[R] lm handling of ill-conditioned systems
At 09:15 AM 20/03/00 -0800, Thomas Lumley wrote:>On 20 Mar 2000, Tomislav Goles wrote:>> The lm() function in R seems to handle the inversion of singular X'X >> matrices (where there is collinearity between regression inputs) in a >> way where one of the inputs is dropped and this also seems to be the >> default behavior in SAS (please let me know if i'm wrong about this). >> In some other packages (i.e. octave ols() function) the pseudo >> inverse is computed where singular values less then some small >> threshold are not included in computation of inverse but the data >> columns themselves are not dropped). >> I am wandering if few experts could comment on what is a better >> (best?) way to handle singular matrices in this context or to perhaps >> point me to some literature on this. > > >When the design matrix is not of full rank there is no information in the >data about certain linear combinations of the variables. This means that >there isn't unique least squares or maximum likelihood estimate for the >parameters. This isn't a numerical analysis issue, it's a statistical one. > >While it is possible to pick out a single estimate using some arbitrary >criterion, it is not possible to construct a useful covariance matrix for >these parameters.It is worth re-iterating that the generalised inverse form of solution is an arbitrary resolution of the lack of uniqueness. If a Moore-Penrose generalised inverse is used. then the constraint that is applied is that the sum of squares of the coefficients should be as small as possible. It would be much preferable to call this the minimum length solution. There are other ways to get this solution than by forming a Moore-Penrose inverse. Note also that there are important components of the calculation that are uniquely defined. There is a unique set of predicted values, residuals, and residual sum of squares. and residuals are uniquely John Maindonald email : john.maindonald at anu.edu.au Statistical Consulting Unit, phone : (6249)3998 c/o CMA, SMS, fax : (6249)5549 John Dedman Mathematical Sciences Building Australian National University Canberra ACT 0200 Australia -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
How do I compile a Fortran subroutine or a C function so that it can be dyn.loaded in the windows version of R-1.0.0? I use Cygnus C and Fortran compilers (and mingw-something). G?ran ---------------------------------------------------------- G?ran Brostr?m tel: +46 90 786-5223 Department of Statistics fax: +46 90 786-6614 Ume? University SE-90187 Ume?, Sweden email: gb at stat.umu.se http://www.stat.umu.se/egna/gb ftp://capa.stat.umu.se ---------------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._