>Subject: [R] outlier detection methods in r?
> hi -
> if I sample from a normal distribution with something like
> n100<-rnorm(100,0,1)
> and add an outlier with
> n100[10]<-4
> then
> qqnorm(n100)
> visually shows the point 4 as an outlier
> and calculating the probablity of a value of 4 or bigger in 100 samples
of norm(0,1)> gives
> > 1-exp(log(pnorm(4,0,1))*100)
> [1] 0.003162164
>
> If I have more than 1 sample above outlier threshold the math is a bit
more complicated> but doable.
> My questions are> 1) are there better ways to assess probablity of outliers (ie value(s)
above theshold from a given distribution)?> 2) are they implimented in r?
1)
The term "a given distribution" makes things a lot difficult, a far as
outlier detection is concerned.
If we are talking about normal distributions, or multivariate normal
distributions, the method based on Mahalanobis distances is the one I
prefer.
If the sample comes from a normal distribution, its Mahalanobis distance
follows a chi-square distribution, so you can allways assess if certain
point is above the threshold determined by your significance level.
2)
You can find mahalanobis() in base package.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._