thr3ads.net - R help - [R] detection of outliers [Sep 2004]

If this information is useful, please help other people find it:
Share via:

Phguardiol@aol.com

2004-Sep-23 14:22 UTC

[R] detection of outliers

Hi,
this is both a statistical and a R question...
what would the best way / test to detect an outlier value among a series of 10
to 30 values ? for instance if we have the following dataset:
10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data as
an outlier (only one direction). One way would be to calculate abs(mean -
median) and if elevated (to what extent ?) delete the extreme data then redo..
but is it valid to do so with so few data ? is the (trimmed mean - mean) more
efficient ? if so, what would be the maximal tolerable value to use as a
threshold ? (I guess it will be experiment dependent...) tests for skweness will
probably required a larger dataset ?
any suggestions are very welcome !
thanks for your help
Philippe Guardiola, MD

Gabor Grothendieck

2004-Sep-23 14:52 UTC

head link

[R] detection of outliers

<Phguardiol <at> aol.com> writes:

: 
: Hi,
: this is both a statistical and a R question...
: what would the best way / test to detect an outlier value among a series of 
10 to 30 values ? for instance if we
: have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a 
way to identify the last data
: as an outlier (only one direction). One way would be to calculate abs(mean - 
median) and if elevated (to
: what extent ?) delete the extreme data then redo.. but is it valid to do so 
with so few data ? is the (trimmed
: mean - mean) more efficient ? if so, what would be the maximal tolerable 
value to use as a threshold ? (I guess
: it will be experiment dependent...) tests for skweness will probably 
required a larger dataset ? 
: any suggestions are very welcome !
: thanks for your help
: Philippe Guardiola, MD


If z is your vector the following all detect outliers:

	boxplot(z)  # will show the outlier

	plot(lm(z ~ 1))  # the various plots show this as well

	require(car)
	outlier.test(lm(z ~ 1)) # tests most extreme value

Dimitris Rizopoulos

2004-Sep-23 14:57 UTC

head link

[R] detection of outliers

Hi Philippe,

you could consider using the Windsorized mean,

winds.mean <-  function(x, k=2){
    y <- x[!is.na(x)]
    mu <- mean(y)
    stdev <- sd(y)
    outliers.up <- y[y>mu+k*stdev]
    outliers.lo <- y[y<mu-k*stdev]
    y[y==outliers.up] <- mu+k*stdev
    y[y==outliers.lo] <- mu-k*stdev
    list(mean=sum(y)/length(y), outliers.up=outliers.up, 
outliers.lo=outliers.lo)
}
##################

x <- c(10,11,12,15,20,22,25,30,500)
mean(x)
winds.mean(x)

I hope this helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: <Phguardiol at aol.com>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, September 23, 2004 4:22 PM
Subject: [R] detection of outliers

> Hi,
> this is both a statistical and a R question...
> what would the best way / test to detect an outlier value among a 
> series of 10 to 30 values ? for instance if we have the following 
> dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to 
> identify the last data as an outlier (only one direction). One way 
> would be to calculate abs(mean - median) and if elevated (to what 
> extent ?) delete the extreme data then redo.. but is it valid to do 
> so with so few data ? is the (trimmed mean - mean) more efficient ? 
> if so, what would be the maximal tolerable value to use as a 
> threshold ? (I guess it will be experiment dependent...) tests for 
> skweness will probably required a larger dataset ?
> any suggestions are very welcome !
> thanks for your help
> Philippe Guardiola, MD
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Christian Hennig

2004-Sep-23 15:14 UTC

head link

[R] detection of outliers

On Thu, 23 Sep 2004 Phguardiol at aol.com wrote:
> Hi,
> this is both a statistical and a R question...
> what would the best way / test to detect an outlier value among a series of
10 to 30 values ? for instance if we have the following dataset:
10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data as
an outlier (only one direction). One way would be to calculate abs(mean -
median) and if elevated (to what extent ?) delete the extreme data then redo..
but is it valid to do so with so few data ? is the (trimmed mean - mean) more
efficient ? if so, what would be the maximal tolerable value to use as a
threshold ? (I guess it will be experiment dependent...) tests for skweness will
probably required a larger dataset ?
> any suggestions are very welcome !
> thanks for your help
> Philippe Guardiola, MD
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
You may want to read 
Davies and Gather, The identification of multiple outliers, JASA 88 (1993),
782-801.

The simplest recommendation is to nominate all points with distance larger
than c*mad(data) from the median as outliers. Choices of c depending on n
are given in the above paper.

This is somewhat better founded theoretically than the boxplot method
recommended by Gabor G., but it is based on the assumption that the
distribution on the non-outliers is close to the normal and especially not
strongly skewed (the boxplot method
seems to be a bit more robust against skewness).

Christian
 
***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de

Vito Ricci

2004-Sep-23 15:19 UTC

head link

[R] detection of outliers

Hi,
give a look to:

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm

it's the Grubbs' Test for Outliers. It is based on the
assumption of normality of data.

Other methods of outliers' could:

Run-Sequence Plot
Histogram
Normal Probability Plot
Box-plot

Best
Vito



you wrote:

Hi,
this is both a statistical and a R question...
what would the best way / test to detect an outlier
value among a series of 10 to 30 values ? for instance
if we have the following dataset:
10,11,12,15,20,22,25,30,500 I d like to have a way to
identify the last data as an outlier (only one
direction). One way would be to calculate abs(mean -
median) and if elevated (to what extent ?) delete the
extreme data then redo.. but is it valid to do so with
so few data ? is the (trimmed mean - mean) more
efficient ? if so, what would be the maximal tolerable
value to use as a threshold ? (I guess it will be
experiment dependent...) tests for skweness will
probably required a larger dataset ? 
any suggestions are very welcome !
thanks for your help
Philippe Guardiola, MD

====Diventare costruttori di soluzioni

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese
http://www.modugno.it/archivio/cat_palese.shtml


		
___________________________________

http://it.seriea.fantasysports.yahoo.com/

Berton Gunter

2004-Sep-23 15:32 UTC

head link

[R] detection of outliers

Not to oversimplify ...

1. (At least) dozens of books and thousands of papers have been written on
this...

2. Most important question is: What is an outlier? (Many smart folks says
that the concept is illogical/flawed -- there is no mystical boundary that
one crosses to become a statistical pariah; many other smart folks
disagree).

3. Equivalently: What is the model with respect to which values are
outlying? (with apologies to Winston Churchill's: "That is an indignity
up
with which I will not put.")

So good advice here is: Beware of good advice about this. (Of course, I may
just be an outlier ...)

;)

Cheers,

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Phguardiol at aol.com
> Sent: Thursday, September 23, 2004 7:22 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] detection of outliers
> 
> Hi,
> this is both a statistical and a R question...
> what would the best way / test to detect an outlier value 
> among a series of 10 to 30 values ? for instance if we have 
> the following dataset: 10,11,12,15,20,22,25,30,500 I d like 
> to have a way to identify the last data as an outlier (only 
> one direction). One way would be to calculate abs(mean - 
> median) and if elevated (to what extent ?) delete the extreme 
> data then redo.. but is it valid to do so with so few data ? 
> is the (trimmed mean - mean) more efficient ? if so, what 
> would be the maximal tolerable value to use as a threshold ? 
> (I guess it will be experiment dependent...) tests for 
> skweness will probably required a larger dataset ? 
> any suggestions are very welcome !
> thanks for your help
> Philippe Guardiola, MD
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Cliff Lunneborg

2004-Sep-25 15:52 UTC

head link

[R] detection of outliers

Dimitris Rizopoulos writes, in part:
> Hi Philippe,
>
> you could consider using the Windsorized mean,
>
> winds.mean <-  function(x, k=2){
FYI, the shrinking of tails process of Winsorization was brought to the
attention of the statistical community by John Tukey. It is named after
its originator, Charley Winsor, and not after the House of Windsor.

**********************************************************
Cliff Lunneborg, Professor Emeritus, Statistics &
Psychology, University of Washington, Seattle
cliff at ms.washington.edu

Maybe Matching Threads

Search for more reasonably related threads

R help - Sep 2004 - detection of outliers

[R] detection of outliers

[R] detection of outliers

[R] detection of outliers

[R] detection of outliers

[R] detection of outliers

[R] detection of outliers

[R] detection of outliers

Maybe Matching Threads