I have a statistical question. The data sets I am working with are right-skewed so I have been plotting the log transformations of my data. I am using a Grubbs Test to detect outliers in the data, but I get different outcomes depending on whether I run the test on the original data or the log(data). Here is one of the problematic sets: fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047) stripchart(fgf2p50,vertical=TRUE) #This next step requires you have the 'outliers' package library(outliers) grubbs.test(fgf2p50) #the output says p<0.05 so 5.047 is an outlier #Next, I run the test on the log(data) log10=c(0.194,0.335,0.403,0.436,0.388,0.703) grubbs.test(log10) #output is that p>0.05 so we reject that there is an outlier. The question is, which outlier test do I accept?
Hi Jahan, What data are you going to use for analyses? The original data or the log transformed? It does not make sense to evaluate your transformed data for analysis based on the original untransformed data (unless you are planning on using the untransformed for the analyses). There is a several good fortunes on outliers: library(fortunes) fortune("just be an outlier") Cheers, Josh On Tue, Nov 30, 2010 at 12:15 PM, Jahan <jahan.mohiuddin at gmail.com> wrote:> I have a statistical question. > The data sets I am working with are right-skewed so I have been > plotting the log transformations of my data. ?I am using a Grubbs Test > to detect outliers in the data, but I get different outcomes depending > on whether I run the test on the original data or the log(data). ?Here > is one of the problematic sets: > > fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047) > stripchart(fgf2p50,vertical=TRUE) > #This next step requires you have the 'outliers' package > library(outliers) > grubbs.test(fgf2p50) > #the output says p<0.05 so 5.047 is an outlier > #Next, I run the test on the log(data) > log10=c(0.194,0.335,0.403,0.436,0.388,0.703) > grubbs.test(log10) > #output is that p>0.05 so we reject that there is an outlier. > > The question is, which outlier test do I accept? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Jahan > Sent: Tuesday, November 30, 2010 12:16 PM > To: r-help at r-project.org > Subject: [R] Outlier statistics question > > I have a statistical question. > The data sets I am working with are right-skewed so I have been > plotting the log transformations of my data. I am using a Grubbs Test > to detect outliers in the data, but I get different outcomes depending > on whether I run the test on the original data or the log(data). Here > is one of the problematic sets: > > fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047) > stripchart(fgf2p50,vertical=TRUE) > #This next step requires you have the 'outliers' package > library(outliers) > grubbs.test(fgf2p50) > #the output says p<0.05 so 5.047 is an outlier > #Next, I run the test on the log(data) > log10=c(0.194,0.335,0.403,0.436,0.388,0.703) > grubbs.test(log10) > #output is that p>0.05 so we reject that there is an outlier. > > The question is, which outlier test do I accept? >You may not want to "accept" either test. What do YOU mean by an outlier, and why is it important for you to detect and handle "outliers" differently? Maybe you should model the data so that the model correctly predicts or explains the so-called outlier. So, what is it that you are wanting to do? Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204
(Apologies to all. I am weak and could not resist) On Tue, Nov 30, 2010 at 12:15 PM, Jahan <jahan.mohiuddin at gmail.com> wrote:> I have a statistical question. > The data sets I am working with are right-skewed so I have been > plotting the log transformations of my data. ?I am using a Grubbs Test > to detect outliers in the data, but I get different outcomes depending > on whether I run the test on the original data or the log(data).Of course! Here> is one of the problematic sets: > > fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047) > stripchart(fgf2p50,vertical=TRUE) > #This next step requires you have the 'outliers' package > library(outliers) > grubbs.test(fgf2p50) > #the output says p<0.05 so 5.047 is an outlier > #Next, I run the test on the log(data) > log10=c(0.194,0.335,0.403,0.436,0.388,0.703) > grubbs.test(log10) > #output is that p>0.05 so we reject that there is an outlier. > > The question is, which outlier test do I accept?Neither. (IMHO) Outlier tests are one of statistics's _bad ideas._ The Grubbs test is ca 1970 . There are many better approaches these days -- consult your local statistician -- all of which will depend on answering the question, "What is the question you are trying to answer?" -- Bert> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter Genentech Nonclinical Biostatistics