I have been struggling to find some informaation on what lm exactly does. I know it uses the QR decomp. However, I was recently faced with a somewhat badly scaled matrix and summary(lm) said Coefficients: ( 4 not defined because of singularities) does anyone know how lm chooses these 4 coef. is it forward building of the model --> drop last when qr sends a non full rank design matrix? My other question is on the regression diagnostics particularly plotting Cook's distance. what is the rule to decide on outliers. If I read the plot correctly, the labeled distances (vertical lines) are outliers. But I have gotten cook's distance and compared them to qf(0, p, n-p) ( the median of the F distribution with paramaters p=# of variables in design, number of obs.-p) but does not give same answer. Lastly, the qr function is supposed to take the LAPACK package in its default but it seems to default LINPACK. The following appears only when qr(x, LAPACK=T) attr(,"useLAPACK") [1] TRUE Thank you for all your help, Jean
Dear Jean, On Thu, 26 Jun 2003, Jean Eid wrote: . . .> My other question is on the regression diagnostics particularly plotting > Cook's distance. what is the rule to decide on outliers. If I read the > plot correctly, the labeled distances (vertical lines) are outliers. But I > have gotten cook's distance and compared them to qf(0, p, n-p) ( the > median of the F distribution with paramaters p=# of variables in design, > number of obs.-p) but does not give same answer.I presume you mean qf(0.5, p, n-p)?>. . . Except for some sense of scale, it's not sensible to treat Cook's distances as F-values. The use of an F statistic in this context is really just a kind of trick to obtain a scale-invariant measure of distance between the coefficient vector for all of the data and the coefficient vector deleting an observation. There is a rule-of-thumb cutoff for noteworthy Cook's distances -- 4/(n - p) -- but I wouldn't place too much stock in it. It's better simply to look for values of Cook's D that stand out from the others. Finaly, Cook's D isn't really an outlier diagnostic, but an influence diagnostic. A low-leverage regression outlier, for example, can have a small Cook's D. I hope that this helps, John
On Thu, 26 Jun 2003, Jean Eid wrote:> I have been struggling to find some informaation on what lm exactly does. > I know it uses the QR decomp. However, I was recently faced with a > somewhat badly scaled matrix and summary(lm) said > Coefficients: ( 4 not defined because of singularities) > does anyone know how lm chooses these 4 coef. is it forward building of > the model --> drop last when qr sends a non full rank design matrix?It is forward building of the QR matrix (not the same thing), and it pivots (to last) columns that it does not add. It's in the source code, file src/appl/dqrls.f. [...]> Lastly, the qr function is supposed to take the LAPACK package in itsSupposed by whom? That's not what the help page says.> default but it seems to default LINPACK. The following appears only when > qr(x, LAPACK=T) > attr(,"useLAPACK") > [1] TRUE-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
> I have been struggling to find some informaation on what lm exactly does. > I know it uses the QR decomp. However, I was recently faced with a > somewhat badly scaled matrix and summary(lm) said > Coefficients: ( 4 not defined because of singularities) > does anyone know how lm chooses these 4 coef. is it forward building of > the model --> drop last when qr sends a non full rank design matrix?- Probably you've seen this, but just in case... - There's a quite good explanation of QR with column pivoting and the subsequent detection of rank deficiency in the least squares context in Golub and Van Loan, Matrix Computations (1983, section 6.4. p162 - I don't have newer editions to hand). Simon _____________________________________________________________________> Simon Wood simon at stats.gla.ac.uk www.stats.gla.ac.uk/~simon/ >> Department of Statistics, University of Glasgow, Glasgow, G12 8QQ >>> Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814