thr3ads.net - R help - [R] Outlier statistics question [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Jahan

2010-Nov-30 20:15 UTC

[R] Outlier statistics question

I have a statistical question.
The data sets I am working with are right-skewed so I have been
plotting the log transformations of my data.  I am using a Grubbs Test
to detect outliers in the data, but I get different outcomes depending
on whether I run the test on the original data or the log(data).  Here
is one of the problematic sets:

fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047)
stripchart(fgf2p50,vertical=TRUE)
#This next step requires you have the 'outliers' package
library(outliers)
grubbs.test(fgf2p50)
#the output says p<0.05 so 5.047 is an outlier
#Next, I run the test on the log(data)
log10=c(0.194,0.335,0.403,0.436,0.388,0.703)
grubbs.test(log10)
#output is that p>0.05 so we reject that there is an outlier.

The question is, which outlier test do I accept?

Joshua Wiley

2010-Nov-30 21:00 UTC

head link

[R] Outlier statistics question

Hi Jahan,

What data are you going to use for analyses?  The original data or the
log transformed?  It does not make sense to evaluate your transformed
data for analysis based on the original untransformed data (unless you
are planning on using the untransformed for the analyses).

There is a several good fortunes on outliers:

library(fortunes)
fortune("just be an outlier")

Cheers,

Josh

On Tue, Nov 30, 2010 at 12:15 PM, Jahan <jahan.mohiuddin at gmail.com>
wrote:> I have a statistical question.
> The data sets I am working with are right-skewed so I have been
> plotting the log transformations of my data. ?I am using a Grubbs Test
> to detect outliers in the data, but I get different outcomes depending
> on whether I run the test on the original data or the log(data). ?Here
> is one of the problematic sets:
>
> fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047)
> stripchart(fgf2p50,vertical=TRUE)
> #This next step requires you have the 'outliers' package
> library(outliers)
> grubbs.test(fgf2p50)
> #the output says p<0.05 so 5.047 is an outlier
> #Next, I run the test on the log(data)
> log10=c(0.194,0.335,0.403,0.436,0.388,0.703)
> grubbs.test(log10)
> #output is that p>0.05 so we reject that there is an outlier.
>
> The question is, which outlier test do I accept?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Nordlund, Dan (DSHS/RDA)

2010-Nov-30 21:05 UTC

head link

[R] Outlier statistics question

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Jahan
> Sent: Tuesday, November 30, 2010 12:16 PM
> To: r-help at r-project.org
> Subject: [R] Outlier statistics question
> 
> I have a statistical question.
> The data sets I am working with are right-skewed so I have been
> plotting the log transformations of my data.  I am using a Grubbs Test
> to detect outliers in the data, but I get different outcomes depending
> on whether I run the test on the original data or the log(data).  Here
> is one of the problematic sets:
> 
> fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047)
> stripchart(fgf2p50,vertical=TRUE)
> #This next step requires you have the 'outliers' package
> library(outliers)
> grubbs.test(fgf2p50)
> #the output says p<0.05 so 5.047 is an outlier
> #Next, I run the test on the log(data)
> log10=c(0.194,0.335,0.403,0.436,0.388,0.703)
> grubbs.test(log10)
> #output is that p>0.05 so we reject that there is an outlier.
> 
> The question is, which outlier test do I accept?
> 
You may not want to "accept" either test.  What do YOU mean by an
outlier, and why is it important for you to detect and handle
"outliers" differently?  Maybe you should model the data so that the
model correctly predicts or explains the so-called outlier.  So, what is it that
you are wanting to do?

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204

Bert Gunter

2010-Nov-30 21:21 UTC

head link

[R] Outlier statistics question

(Apologies to all. I am weak and could not resist)

On Tue, Nov 30, 2010 at 12:15 PM, Jahan <jahan.mohiuddin at gmail.com>
wrote:> I have a statistical question.
> The data sets I am working with are right-skewed so I have been
> plotting the log transformations of my data. ?I am using a Grubbs Test
> to detect outliers in the data, but I get different outcomes depending
> on whether I run the test on the original data or the log(data).
Of course!

Here> is one of the problematic sets:
>
> fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047)
> stripchart(fgf2p50,vertical=TRUE)
> #This next step requires you have the 'outliers' package
> library(outliers)
> grubbs.test(fgf2p50)
> #the output says p<0.05 so 5.047 is an outlier
> #Next, I run the test on the log(data)
> log10=c(0.194,0.335,0.403,0.436,0.388,0.703)
> grubbs.test(log10)
> #output is that p>0.05 so we reject that there is an outlier.
>
> The question is, which outlier test do I accept?
Neither.

(IMHO) Outlier tests are one of statistics's _bad ideas._ The Grubbs
test is ca 1970 . There are many better approaches these days --
consult your local statistician -- all of which will depend on
answering the question,  "What is the question you are trying to
answer?"

-- Bert
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Bert Gunter
Genentech Nonclinical Biostatistics

Maybe Matching Threads

Search for more reasonably related threads

R help - Nov 2010 - Outlier statistics question

[R] Outlier statistics question

[R] Outlier statistics question

[R] Outlier statistics question

[R] Outlier statistics question

Maybe Matching Threads