Happy holidays all! I know it's very subjective to determine whether some data is outlier or not... But are there reasonally good and realistic methods of identifying outliers in R? Thanks a lot! [[alternative HTML version deleted]]
Hi Michael, I'm afraid this is one of those cases where the short answer is "No" and the long answer is, "No." If you are working with a data set stored in a data frame, something like: sapply(mtcars, function(x) if (is.numeric(x)) range(x, na.rm = TRUE) else c(NA, NA)) should give you the range for all numeric variables---which is a simple check if any values fall outside the possible range (say you have an age variable with a -3 or 320). Beyond that, you can inspect data visually, but ultimately, you have to decide what an outlier is and justify it. Cheers, Josh On Fri, Dec 30, 2011 at 9:03 AM, Michael <comtech.usa at gmail.com> wrote:> Happy holidays all! > > I know it's very subjective to determine whether some data is outlier or > not... > > But are there reasonally good and realistic methods of identifying outliers > in R? > > Thanks a lot! > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
On Fri, Dec 30, 2011 at 9:03 AM, Michael <comtech.usa at gmail.com> wrote:> Happy holidays all! > > I know it's very subjective to determine whether some data is outlier or > not... > > But are there reasonally good and realistic methods of identifying outliers > in R?What kind of data do you have? For simple numeric data, there are various methods for removing outliers developed for robust estimation and I'm sure they are implemented in R. For example, this link http://www.unt.edu/benchmarks/archives/2001/december01/rss.htm describes how to calculate a robust measure of correlation that includes a method to downweigh (or remove) outliers. For identifying outlier samples in multivariate setting, the possibilities are even more varied, from simple hierarchical clustering and visual identification of outliers to network connectivity methods etc. HTH, Peter
On 30/12/11 17:03, Michael wrote:> Happy holidays all! > > I know it's very subjective to determine whether some data is outlier or > not... > > But are there reasonally good and realistic methods of identifying outliers > in R? > > Thanks a lot! > >Ignoring the moral questions for a moment (totaly depends on your defintion of an outlier, your dataset, it's distribution etc etc), for the technical implementation, try the outliers package (http://www.stats.bris.ac.uk/R/web/packages/outliers/index.html), which implements the Grubbs and Cox tests. Also, see this stackoverflow answer of mine that shows an implementation of the Llund test for outliers within a regression ( http://stackoverflow.com/a/1444548/74658 ). Regards, Paul.