Karthik Srinivasan
2013-May-17 07:31 UTC
[R] Using grubbs test for residuals to find outliers
Hi, I am a new user of R. This is a conceptual doubt regarding screeing out outliers from the dataset in regression. I read up that Cook's distance can be used and if we want to remove influential observations, we can use the metric (>4/n) (n=no of observations) to remove any outliers. I also came across Grubb's test to identify outliers in univariate distns. (assumed normal) but i was not able to find contexts in Regression where Grubb's test is used (may be I didn't search enough) Is it a good idea to find out Cook's distance and identify outliers. Perform the Grubb's test for each of these outliers and then delete them? Right now, I am only using Cook's distance in my problem but I am uncertain as repeating the procedure with the new datasets (after removing influential observations) subsequently still keeps showing outliers in the plots. One reason maybe, i have only 50 data tuples and around 10 input variables in the Multiple regression equation. Am I going wrong in my fundamentals while using this approach. Thanks and regards, Karthik Srinivasan M.Mgt - Business Analytics Indian Institute of Science, Bangalore [[alternative HTML version deleted]]