Stefano Calza
2004-Oct-15 10:50 UTC
[R] Testing for normality of residuals in a regression model
What about shapiro.test(resid(fit.object)) Stefano On Fri, Oct 15, 2004 at 02:44:18PM +0200, Federico Gherardini wrote:> Hi all, > > Is it possible to have a test value for assessing the normality of > residuals from a linear regression model, instead of simply relying on > qqplots? > I've tried to use fitdistr to try and fit the residuals with a normal > distribution, but fitdsitr only returns the parameters of the > distribution and the standard errors, not the p-value. Am I missing > something? > > Cheers, > > Federico > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
Dimitris Rizopoulos
2004-Oct-15 11:10 UTC
[R] Testing for normality of residuals in a regression model
Hi Frederico, take also a look at the package "nortest": help(package="nortest") Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/396887 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm. ----- Original Message ----- From: "Federico Gherardini" <f.gherardini at pigrecodata.net> To: <R-help at stat.math.ethz.ch> Sent: Friday, October 15, 2004 2:44 PM Subject: [R] Testing for normality of residuals in a regression model> Hi all, > > Is it possible to have a test value for assessing the normality of > residuals from a linear regression model, instead of simply relying > on qqplots? > I've tried to use fitdistr to try and fit the residuals with a > normal distribution, but fitdsitr only returns the parameters of the > distribution and the standard errors, not the p-value. Am I missing > something? > > Cheers, > > Federico > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
John Fox
2004-Oct-15 12:43 UTC
[R] Testing for normality of residuals in a regression model
Dear Federico, A problem with applying a standard test of normality to LS residuals is that the residuals are correlated and heterskedastic even if the standard assumptions of the model hold. In a large sample, this is unlikely to be problematic (unless there's an unusual data configuration), but in a small sample the effect could be nontrivial. One approach is to use BLUS residuals, which transform the LS residuals to a smaller set of uncorrelated, homoskedastic residuals (assuming the correctness of the model). A search of R resources didn't turn up anything for BLUS, but they shouldn't be hard to compute. This is a standard topic covered in many econometrics texts. You might consider the alternative of generating a bootstrapped confidence envelope for the QQ plot; the qq.plot() function in the car package will do this for a linear model. I hope this helps, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Federico Gherardini > Sent: Friday, October 15, 2004 7:44 AM > To: R-help at stat.math.ethz.ch > Subject: [R] Testing for normality of residuals in a regression model > > Hi all, > > Is it possible to have a test value for assessing the > normality of residuals from a linear regression model, > instead of simply relying on qqplots? > I've tried to use fitdistr to try and fit the residuals with > a normal distribution, but fitdsitr only returns the > parameters of the distribution and the standard errors, not > the p-value. Am I missing something? > > Cheers, > > Federico > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
Federico Gherardini
2004-Oct-15 12:44 UTC
[R] Testing for normality of residuals in a regression model
Hi all, Is it possible to have a test value for assessing the normality of residuals from a linear regression model, instead of simply relying on qqplots? I've tried to use fitdistr to try and fit the residuals with a normal distribution, but fitdsitr only returns the parameters of the distribution and the standard errors, not the p-value. Am I missing something? Cheers, Federico
Federico Gherardini
2004-Oct-15 16:22 UTC
[R] Testing for normality of residuals in a regression model
Thank you very much for your suggestions! The residuals come from a gls model, because I had to correct for heteroscedasticity using a weighted regression... can I simply apply one of these tests (like shapiro.test) to the standardized residuals from my gls model? Cheers, Federico
Liaw, Andy
2004-Oct-15 16:55 UTC
[R] Testing for normality of residuals in a regression model
Let's see if I can get my stat 101 straight: We learned that linear regression has a set of assumptions: 1. Linearity of the relationship between X and y. 2. Independence of errors. 3. Homoscedasticity (equal error variance). 4. Normality of errors. Now, we should ask: Why are they needed? Can we get away with less? What if some of them are not met? It should be clear why we need #1. Without #2, I believe the least squares estimator is still unbias, but the usual estimate of SEs for the coefficients are wrong, so the t-tests are wrong. Without #3, the coefficients are, again, still unbiased, but not as efficient as can be. Interval estimates for the prediction will surely be wrong. Without #4, well, it depends. If the residual DF is sufficiently large, the t-tests are still valid because of CLT. You do need normality if you have small residual DF. The problem with normality tests, I believe, is that they usually have fairly low power at small sample sizes, so that doesn't quite help. There's no free lunch: A normality test with good power will usually have good power against a fairly narrow class of alternatives, and almost no power against others (directional test). How do you decide what to use? Has anyone seen a data set where the normality test on the residuals is crucial in coming up with appriate analysis? Cheers, Andy> From: Federico Gherardini > > Berton Gunter wrote: > > >>>Exactly! My point is that normality tests are useless for > this purpose for > >>>reasons that are beyond what I can take up here. > >>> > Thanks for your suggestions, I undesrtand that! Could you > possibly give > me some (not too complicated!) > links so that I can investigate this matter further? > > Cheers, > > Federico > > >>>Hints: Balanced designs are > >>>robust to non-normality; independence (especially > "clustering" of subjects > >>>due to systematic effects), not normality is usually the > biggest real > >>>statistical problem; hypothesis tests will always reject > when samples are > >>>large -- so what!; "trust" refers to prediction validity > which has to do > >>>with study design and the validity/representativeness of > the current data to > >>>future. > >>> > >>>I know that all the stats 101 tests say to test for > normality, but they're > >>>full of baloney! > >>> > >>>Of course, this is "free" advice -- so caveat emptor! > >>> > >>>Cheers, > >>>Bert > >>> > >>> > >>> > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Liaw, Andy
2004-Oct-15 18:14 UTC
[R] Testing for normality of residuals in a regression model
Hi John, Your point is well taken. I was only thinking about the shape of the distribution, and neglected the cases of, say, symmetric long tailed distributions. However, I think I'd still argue that other tools are probably more useful than normality tests (e.g., robust methods, as you mentioned). To take the point a bit further, let's say we test for normality and it's rejected. What do we do then? Well, if the non-normality is caused by outliers, we can try robust methods. If not, what do we do? We can try to see if some sort of transformation would bring the residuals closer to normally distributed, but if the interest is in inference on the coefficients, those inferences on the `final' model are potentially invalid. What's one to do then? Also, I was told by someone very smart that fitting OLS to data with heteroscedastic errors can make the residuals look `more normal' than they really are... Don't know how true that is, though. Best, Andy> From: John Fox > > Dear Andy, > > At the risk of muddying the waters (and certainly without wanting to > advocate the use of normality tests for residuals), I believe > that your > point #4 is subject to misinterpretation: That is, while it > is true that t- > and F-tests for regression coefficients in large sample retain their > validity well when the errors are non-normal, the efficiency of the LS > estimates can (depending upon the nature of the > non-normality) be seriously > compromised, not only absolutely but in relation to > alternatives (e.g., > robust regression). > > Regards, > John > > -------------------------------- > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario > Canada L8S 4M4 > 905-525-9140x23604 > http://socserv.mcmaster.ca/jfox > -------------------------------- > > > -----Original Message----- > > From: r-help-bounces at stat.math.ethz.ch > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy > > Sent: Friday, October 15, 2004 11:55 AM > > To: 'Federico Gherardini'; Berton Gunter > > Cc: R-help mailing list > > Subject: RE: [R] Testing for normality of residuals in a > > regression model > > > > Let's see if I can get my stat 101 straight: > > > > We learned that linear regression has a set of assumptions: > > > > 1. Linearity of the relationship between X and y. > > 2. Independence of errors. > > 3. Homoscedasticity (equal error variance). > > 4. Normality of errors. > > > > Now, we should ask: Why are they needed? Can we get away > > with less? What if some of them are not met? > > > > It should be clear why we need #1. > > > > Without #2, I believe the least squares estimator is still > > unbias, but the usual estimate of SEs for the coefficients > > are wrong, so the t-tests are wrong. > > > > Without #3, the coefficients are, again, still unbiased, but > > not as efficient as can be. Interval estimates for the > > prediction will surely be wrong. > > > > Without #4, well, it depends. If the residual DF is > > sufficiently large, the t-tests are still valid because of > > CLT. You do need normality if you have small residual DF. > > > > The problem with normality tests, I believe, is that they > > usually have fairly low power at small sample sizes, so that > > doesn't quite help. There's no free lunch: A normality test > > with good power will usually have good power against a fairly > > narrow class of alternatives, and almost no power against > > others (directional test). How do you decide what to use? > > > > Has anyone seen a data set where the normality test on the > > residuals is crucial in coming up with appriate analysis? > > > > Cheers, > > Andy > > > > > From: Federico Gherardini > > > > > > Berton Gunter wrote: > > > > > > >>>Exactly! My point is that normality tests are useless for > > > this purpose for > > > >>>reasons that are beyond what I can take up here. > > > >>> > > > Thanks for your suggestions, I undesrtand that! Could you > possibly > > > give me some (not too complicated!) links so that I can > investigate > > > this matter further? > > > > > > Cheers, > > > > > > Federico > > > > > > >>>Hints: Balanced designs are > > > >>>robust to non-normality; independence (especially > > > "clustering" of subjects > > > >>>due to systematic effects), not normality is usually the > > > biggest real > > > >>>statistical problem; hypothesis tests will always reject > > > when samples are > > > >>>large -- so what!; "trust" refers to prediction validity > > > which has to do > > > >>>with study design and the validity/representativeness of > > > the current data to > > > >>>future. > > > >>> > > > >>>I know that all the stats 101 tests say to test for > > > normality, but they're > > > >>>full of baloney! > > > >>> > > > >>>Of course, this is "free" advice -- so caveat emptor! > > > >>> > > > >>>Cheers, > > > >>>Bert > > > >>> > > > >>> > > > >>> > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > >
Federico Gherardini
2004-Oct-15 18:24 UTC
[R] Testing for normality of residuals in a regression model
Berton Gunter wrote:>>>Exactly! My point is that normality tests are useless for this purpose for >>>reasons that are beyond what I can take up here. >>>Thanks for your suggestions, I undesrtand that! Could you possibly give me some (not too complicated!) links so that I can investigate this matter further? Cheers, Federico>>>Hints: Balanced designs are >>>robust to non-normality; independence (especially "clustering" of subjects >>>due to systematic effects), not normality is usually the biggest real >>>statistical problem; hypothesis tests will always reject when samples are >>>large -- so what!; "trust" refers to prediction validity which has to do >>>with study design and the validity/representativeness of the current data to >>>future. >>> >>>I know that all the stats 101 tests say to test for normality, but they're >>>full of baloney! >>> >>>Of course, this is "free" advice -- so caveat emptor! >>> >>>Cheers, >>>Bert >>> >>> >>>
R and especially Bioconductor are the "Gold Standard" for microarry analysis, see: http://www.bioconductor.org/ Regards Christian Sun wrote:> Hello, Rusers: > > What is the maximum number of data R can handle? Or I have to use SAS? I am > trying to do some microarray data analysis. But I am totally new. Did anyone > use R to do microarray analysis? > > Many thanks, > > Sun > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >