John Malone
2009-Feb-14 23:01 UTC
[R] implementing Grubbs outlier test on a large dataframe
Hi! I'm trying to implement an outlier test once/row in a large dataframe. Ideally, I'd do this then add the Pvalue results and the number flagged as an outlier as two new separate columns to the dataframe. Grubbs outlier test requires a vector and I'm confused how to make each row of my dataframe a vector, followed by doing a Grubbs test for each row containing the vector of numbers I want to perform the outlier test on. I'm new to R and no doubt this is a simple problem. Any help you might provide would be greatly appreciated. Many thanks in advance!! [[alternative HTML version deleted]]
David Winsemius
2009-Feb-14 23:17 UTC
[R] implementing Grubbs outlier test on a large dataframe
Sending each row of a datatframe, dfm, as a vector to a function, fcn, is as simple as; apply(dfm, 1, fcn) e.g.: > dfm <- data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10)) > > apply(dfm, 1, sum) [1] 0.7385838 -3.1819193 0.3415670 -0.6552601 -1.3470174 -0.6446259 -0.6544967 [8] 0.1778169 -0.3330527 0.6246071 And with the second argument set to 2, you would get a columnwise application of the function. You need to show us what your function looks like to go any further. I am unclear how one could get a function that only operates on a single row to yield an outlier classification. -- David Winsemius On Feb 14, 2009, at 6:01 PM, John Malone wrote:> Hi! > > I'm trying to implement an outlier test once/row in a large dataframe. > Ideally, I'd do this then add the Pvalue results and the number > flagged as > an outlier as two new separate columns to the dataframe. Grubbs > outlier > test requires a vector and I'm confused how to make each row of my > dataframe > a vector, followed by doing a Grubbs test for each row containing > the vector > of numbers I want to perform the outlier test on. > > I'm new to R and no doubt this is a simple problem. Any help you might > provide would be greatly appreciated. > > Many thanks in advance!! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Frank E Harrell Jr
2009-Feb-15 00:23 UTC
[R] implementing Grubbs outlier test on a large dataframe
John Malone wrote:> Hi! > > I'm trying to implement an outlier test once/row in a large dataframe. > Ideally, I'd do this then add the Pvalue results and the number flagged as > an outlier as two new separate columns to the dataframe. Grubbs outlier > test requires a vector and I'm confused how to make each row of my dataframe > a vector, followed by doing a Grubbs test for each row containing the vector > of numbers I want to perform the outlier test on. > > I'm new to R and no doubt this is a simple problem. Any help you might > provide would be greatly appreciated. > > Many thanks in advance!! > > [[alternative HTML version deleted]] >John - you would be making a strong normality assumption. You might reject H0 using Grubbs' test just because of non-normality, or you might fail to reject it just because of non-normality. Is it really this straitforward to declare something an outlier? What does outlier really mean? The following is must reading. @Article{fin06cal, author = {Finney, David J.}, title = {Calibration guidelines challenge outlier practices}, journal = The American Statistician, year = 2006, volume = 60, pages = {309-313}, annote = {anticoagulant therapy;bias;causation;ethics;objectivity;outliers;guidelines for treatment of outliers;overview of types of outliers;letter to the editor and reply 61:187 May 2007} -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University