tom soyer

2007-Nov-30 05:54 UTC

### [R] what to do if residuals produced by lm() have long tails?

Hi, I am using lm() for regression analysis of my data set. My regression results look pretty good, i.e., the coefficient is significant and the p value is much less than 0.05. But when I checked the residuals, both using qqnorm() and hist(), the distribution does not look normal. It looks like the residuals have long tails. I assume that lm() uses OLS, and since one of the assumptions of OLS is that the residuals has to be normally distributed, I am wondering if this means I should reject my regression results all together. If so, then what should I use instead? Are there ways to deal with distributions with long tails using lm() or OLS, or entirely different models are needed instead? Thanks, -- Tom [[alternative HTML version deleted]]

Prof Brian Ripley

2007-Nov-30 06:54 UTC

### [R] what to do if residuals produced by lm() have long tails?

On Thu, 29 Nov 2007, tom soyer wrote:> Hi, > > I am using lm() for regression analysis of my data set. My regression > results look pretty good, i.e., the coefficient is significant and the p > value is much less than 0.05. But when I checked the residuals, both using > qqnorm() and hist(), the distribution does not look normal. It looks like > the residuals have long tails. I assume that lm() uses OLS, and since one of > the assumptions of OLS is that the residuals has to be normally distributed, > I am wondering if this means I should reject my regression results all > together. If so, then what should I use instead? Are there ways to deal with > distributions with long tails using lm() or OLS, or entirely different > models are needed instead?The main point is that least squares is rather inefficient with long-tailed error distributions. Robust methods are designed to be efficient for a wide class of long-tailed distributions, and so are preferable. Use e.g. rlm (package MASS) or lmRob (package robust) in place of lm. If this makes a different to your 'regression results', then yes, you need to reject the least-squares results. This is discussed in good texts on doing statistics with R, e.g. MASS. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595