You may want to read the discussion on detection of outliers in the mailing
list archive.
The boxplot definition has an in-built outlier threshold for univariate
data, namely 1.5 times IQR distance from the upper and lower quartile.
Simple and good is also the threshold c*mad from the median, where c is
sometimes recommended to be 5.2 or 6.
If you have more complicated data (multivariate/time series), it becomes
more complicated. Even for univariate data, it depends on the distributional
shape and the aim of outlier identification.
What do you mean by "threshold with kmeans"? If you mean the detection
of
outliers in presence of clustering, then kmeans does not help you further,
but EMclustN (package mclust) and NNclean (package prabclus) may be
interesting.
Best,
Christian
On Fri, 25 Feb 2005, Melanie Vida wrote:
> For the analysis of financial data wih a large variance, what is the best
way to select an outlier threshold?
>
> Listed below, is there a best method to select an outlier threshold and how
does R calculate it?
>
> In R, how do you find the outlier threshold through an interquartile range?
> In R, how do you find the outlier threshold using the hist command?
> In R, how do you find the outlier threshold with Chebyshev Inequality?
> In R, how do you find the outlier threshold with Kmeans?
>
> Also, is there a better way to select an outlier threshold not listed
above?
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de,
http://www.math.uni-hamburg.de/home/hennig/>From 1 April 2005: Department of Statistical Science, UCL, London
#######################################################################
ich empfehle www.boag-online.de