thr3ads.net - R help - [R] Testing for normality of residuals in a regression model [Oct 2004]

If this information is useful, please help other people find it:
Share via:

Stefano Calza

2004-Oct-15 10:50 UTC

[R] Testing for normality of residuals in a regression model

What about shapiro.test(resid(fit.object))

Stefano

On Fri, Oct 15, 2004 at 02:44:18PM +0200, Federico Gherardini
wrote:> Hi all,
> 
> Is it possible to have a test value for assessing the normality of 
> residuals from a linear regression model, instead of simply relying on 
> qqplots?
> I've tried to use fitdistr to try and fit the residuals with a normal 
> distribution, but fitdsitr only returns the parameters of the 
> distribution and the standard errors, not the p-value. Am I missing 
> something?
> 
> Cheers,
> 
> Federico
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

Dimitris Rizopoulos

2004-Oct-15 11:10 UTC

head link

[R] Testing for normality of residuals in a regression model

Hi Frederico,

take also a look at the package "nortest":

help(package="nortest")

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm.


----- Original Message ----- 
From: "Federico Gherardini" <f.gherardini at pigrecodata.net>
To: <R-help at stat.math.ethz.ch>
Sent: Friday, October 15, 2004 2:44 PM
Subject: [R] Testing for normality of residuals in a regression model

> Hi all,
>
> Is it possible to have a test value for assessing the normality of 
> residuals from a linear regression model, instead of simply relying 
> on qqplots?
> I've tried to use fitdistr to try and fit the residuals with a 
> normal distribution, but fitdsitr only returns the parameters of the 
> distribution and the standard errors, not the p-value. Am I missing 
> something?
>
> Cheers,
>
> Federico
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

John Fox

2004-Oct-15 12:43 UTC

head link

[R] Testing for normality of residuals in a regression model

Dear Federico,

A problem with applying a standard test of normality to LS residuals is that
the residuals are correlated and heterskedastic even if the standard
assumptions of the model hold. In a large sample, this is unlikely to be
problematic (unless there's an unusual data configuration), but in a small
sample the effect could be nontrivial.

One approach is to use BLUS residuals, which transform the LS residuals to a
smaller set of uncorrelated, homoskedastic residuals (assuming the
correctness of the model). A search of R resources didn't turn up anything
for BLUS, but they shouldn't be hard to compute. This is a standard topic
covered in many econometrics texts.

You might consider the alternative of generating a bootstrapped confidence
envelope for the QQ plot; the qq.plot() function in the car package will do
this for a linear model.

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Federico Gherardini
> Sent: Friday, October 15, 2004 7:44 AM
> To: R-help at stat.math.ethz.ch
> Subject: [R] Testing for normality of residuals in a regression model
> 
> Hi all,
> 
> Is it possible to have a test value for assessing the 
> normality of residuals from a linear regression model, 
> instead of simply relying on qqplots?
> I've tried to use fitdistr to try and fit the residuals with 
> a normal distribution, but fitdsitr only returns the 
> parameters of the distribution and the standard errors, not 
> the p-value. Am I missing something?
> 
> Cheers,
> 
> Federico
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

Federico Gherardini

2004-Oct-15 12:44 UTC

head link

[R] Testing for normality of residuals in a regression model

Hi all,

Is it possible to have a test value for assessing the normality of 
residuals from a linear regression model, instead of simply relying on 
qqplots?
I've tried to use fitdistr to try and fit the residuals with a normal 
distribution, but fitdsitr only returns the parameters of the 
distribution and the standard errors, not the p-value. Am I missing 
something?

Cheers,

Federico

Federico Gherardini

2004-Oct-15 16:22 UTC

head link

[R] Testing for normality of residuals in a regression model

Thank you very much for your suggestions! The residuals come from a gls 
model, because I had to correct for heteroscedasticity using a weighted 
regression... can I simply apply one of these tests (like shapiro.test) 
to the standardized residuals from my gls model?

Cheers,
Federico

Liaw, Andy

2004-Oct-15 16:55 UTC

head link

[R] Testing for normality of residuals in a regression model

Let's see if I can get my stat 101 straight:

We learned that linear regression has a set of assumptions:

1. Linearity of the relationship between X and y.
2. Independence of errors.
3. Homoscedasticity (equal error variance).
4. Normality of errors.

Now, we should ask:  Why are they needed?  Can we get away with less?  What
if some of them are not met?

It should be clear why we need #1.

Without #2, I believe the least squares estimator is still unbias, but the
usual estimate of SEs for the coefficients are wrong, so the t-tests are
wrong.

Without #3, the coefficients are, again, still unbiased, but not as
efficient as can be.  Interval estimates for the prediction will surely be
wrong.

Without #4, well, it depends.  If the residual DF is sufficiently large, the
t-tests are still valid because of CLT.  You do need normality if you have
small residual DF.

The problem with normality tests, I believe, is that they usually have
fairly low power at small sample sizes, so that doesn't quite help. 
There's
no free lunch:  A normality test with good power will usually have good
power against a fairly narrow class of alternatives, and almost no power
against others (directional test).  How do you decide what to use?

Has anyone seen a data set where the normality test on the residuals is
crucial in coming up with appriate analysis?

Cheers,
Andy
> From: Federico Gherardini
> 
> Berton Gunter wrote:
> 
> >>>Exactly! My point is that normality tests are useless for 
> this purpose for
> >>>reasons that are beyond what I can take up here. 
> >>>
> Thanks for your suggestions, I undesrtand that! Could you 
> possibly give 
> me some (not too complicated!)
> links so that I can investigate this matter further?
> 
> Cheers,
> 
> Federico
> 
> >>>Hints: Balanced designs are
> >>>robust to non-normality; independence (especially 
> "clustering" of subjects
> >>>due to systematic effects), not normality is usually the 
> biggest real
> >>>statistical problem; hypothesis tests will always reject 
> when samples are
> >>>large -- so what!; "trust" refers to prediction
validity
> which has to do
> >>>with study design and the validity/representativeness of 
> the current data to
> >>>future. 
> >>>
> >>>I know that all the stats 101 tests say to test for 
> normality, but they're
> >>>full of baloney!
> >>>
> >>>Of course, this is "free" advice -- so caveat emptor!
> >>>
> >>>Cheers,
> >>>Bert
> >>>
> >>>      
> >>>
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

Liaw, Andy

2004-Oct-15 18:14 UTC

head link

[R] Testing for normality of residuals in a regression model

Hi John,

Your point is well taken.  I was only thinking about the shape of the
distribution, and neglected the cases of, say, symmetric long tailed
distributions.  However, I think I'd still argue that other tools are
probably more useful than normality tests (e.g., robust methods, as you
mentioned).

To take the point a bit further, let's say we test for normality and
it's
rejected.  What do we do then?  Well, if the non-normality is caused by
outliers, we can try robust methods.  If not, what do we do?  We can try to
see if some sort of transformation would bring the residuals closer to
normally distributed, but if the interest is in inference on the
coefficients, those inferences on the `final' model are potentially invalid.
What's one to do then?

Also, I was told by someone very smart that fitting OLS to data with
heteroscedastic errors can make the residuals look `more normal' than they
really are...  Don't know how true that is, though.

Best,
Andy
> From: John Fox
> 
> Dear Andy,
> 
> At the risk of muddying the waters (and certainly without wanting to
> advocate the use of normality tests for residuals), I believe 
> that your
> point #4 is subject to misinterpretation: That is, while it 
> is true that t-
> and F-tests for regression coefficients in large sample retain their
> validity well when the errors are non-normal, the efficiency of the LS
> estimates can (depending upon the nature of the 
> non-normality) be seriously
> compromised, not only absolutely but in relation to 
> alternatives (e.g.,
> robust regression).
> 
> Regards,
>  John
> 
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox 
> -------------------------------- 
> 
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch 
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy
> > Sent: Friday, October 15, 2004 11:55 AM
> > To: 'Federico Gherardini'; Berton Gunter
> > Cc: R-help mailing list
> > Subject: RE: [R] Testing for normality of residuals in a 
> > regression model
> > 
> > Let's see if I can get my stat 101 straight:
> > 
> > We learned that linear regression has a set of assumptions:
> > 
> > 1. Linearity of the relationship between X and y.
> > 2. Independence of errors.
> > 3. Homoscedasticity (equal error variance).
> > 4. Normality of errors.
> > 
> > Now, we should ask:  Why are they needed?  Can we get away 
> > with less?  What if some of them are not met?
> > 
> > It should be clear why we need #1.
> > 
> > Without #2, I believe the least squares estimator is still 
> > unbias, but the usual estimate of SEs for the coefficients 
> > are wrong, so the t-tests are wrong.
> > 
> > Without #3, the coefficients are, again, still unbiased, but 
> > not as efficient as can be.  Interval estimates for the 
> > prediction will surely be wrong.
> > 
> > Without #4, well, it depends.  If the residual DF is 
> > sufficiently large, the t-tests are still valid because of 
> > CLT.  You do need normality if you have small residual DF.
> > 
> > The problem with normality tests, I believe, is that they 
> > usually have fairly low power at small sample sizes, so that 
> > doesn't quite help.  There's no free lunch:  A normality test 
> > with good power will usually have good power against a fairly 
> > narrow class of alternatives, and almost no power against 
> > others (directional test).  How do you decide what to use?
> > 
> > Has anyone seen a data set where the normality test on the 
> > residuals is crucial in coming up with appriate analysis?
> > 
> > Cheers,
> > Andy
> > 
> > > From: Federico Gherardini
> > > 
> > > Berton Gunter wrote:
> > > 
> > > >>>Exactly! My point is that normality tests are useless
for
> > > this purpose for
> > > >>>reasons that are beyond what I can take up here. 
> > > >>>
> > > Thanks for your suggestions, I undesrtand that! Could you 
> possibly 
> > > give me some (not too complicated!) links so that I can 
> investigate 
> > > this matter further?
> > > 
> > > Cheers,
> > > 
> > > Federico
> > > 
> > > >>>Hints: Balanced designs are
> > > >>>robust to non-normality; independence (especially
> > > "clustering" of subjects
> > > >>>due to systematic effects), not normality is usually
the
> > > biggest real
> > > >>>statistical problem; hypothesis tests will always
reject
> > > when samples are
> > > >>>large -- so what!; "trust" refers to
prediction validity
> > > which has to do
> > > >>>with study design and the validity/representativeness
of
> > > the current data to
> > > >>>future. 
> > > >>>
> > > >>>I know that all the stats 101 tests say to test for
> > > normality, but they're
> > > >>>full of baloney!
> > > >>>
> > > >>>Of course, this is "free" advice -- so
caveat emptor!
> > > >>>
> > > >>>Cheers,
> > > >>>Bert
> > > >>>
> > > >>>      
> > > >>>
> > > 
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! 
> > > http://www.R-project.org/posting-guide.html
> > > 
> > >
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> 
> 
>

Federico Gherardini

2004-Oct-15 18:24 UTC

head link

[R] Testing for normality of residuals in a regression model

Berton Gunter wrote:
>>>Exactly! My point is that normality tests are useless for this
purpose for
>>>reasons that are beyond what I can take up here. 
>>>Thanks for your suggestions, I undesrtand that! Could you possibly give 
me some (not too complicated!)
links so that I can investigate this matter further?

Cheers,

Federico
>>>Hints: Balanced designs are
>>>robust to non-normality; independence (especially
"clustering" of subjects
>>>due to systematic effects), not normality is usually the biggest
real
>>>statistical problem; hypothesis tests will always reject when
samples are
>>>large -- so what!; "trust" refers to prediction validity
which has to do
>>>with study design and the validity/representativeness of the current
data to
>>>future. 
>>>
>>>I know that all the stats 101 tests say to test for normality, but
they're
>>>full of baloney!
>>>
>>>Of course, this is "free" advice -- so caveat emptor!
>>>
>>>Cheers,
>>>Bert
>>>
>>>      
>>>

cstrato

2004-Oct-16 16:21 UTC

head link

[R] 20,000 * 6 data values

R and especially Bioconductor are the "Gold Standard" for
microarry analysis, see: http://www.bioconductor.org/

Regards
Christian

Sun wrote:> Hello, Rusers:
> 
> What is the maximum number of data R can handle? Or I have to use SAS? I am
> trying to do some microarray data analysis. But I am totally new. Did
anyone
> use R to do microarray analysis?
> 
> Many thanks,
> 
> Sun
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
>

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Oct 2004 - Testing for normality of residuals in a regression model

[R] Testing for normality of residuals in a regression model

[R] Testing for normality of residuals in a regression model

[R] Testing for normality of residuals in a regression model

[R] Testing for normality of residuals in a regression model

[R] Testing for normality of residuals in a regression model

[R] Testing for normality of residuals in a regression model

[R] Testing for normality of residuals in a regression model

[R] Testing for normality of residuals in a regression model

[R] 20,000 * 6 data values

Apparently Analagous Threads