Hello,
I need to analyse a data matrix with dimensions of 30x100.
Before analysing the data there is, however, a need to remove outliers from
the data.
I read quite a lot about outlier removal already and I think the most common
technique for that seems to be Principal Component Analysis (PCA). However,
I think that these technqiue is quite subjective. When is an outlier an
outlier?
I uploaded an example PCA plot here:
http://s14.postimage.org/oknyya1ld/pca.png
Should we treat the green and red dots as outliers already or only the blue
one which
lies outside the 95% confidence interval. It seems very arbitrary how people
remove outliers using PCA.
I also thought about fitting a linear model through my data and look at
distribution of the residuals.
However, the problem with using linear models is that one can actually never
be sure that the model
used is the one which describes the data best. In model A, for instance, we
might treat sample 1 as and
outlier but fitting a different model B sample 1 might not be an outlier at
all.
I had a brief look at k-means clustering as well but I think it's not the
right thing to go for.
Again, how do one decide which cluster is an outler? And also it is known
that different
cluster analysis lead to totally different results. So which one to choose?
Is there any other way to non-subjectively remove outliers from data?
I would really appreciated any ideas/comments you might have on that topic.
Cheers
--
View this message in context:
http://r.789695.n4.nabble.com/Outlier-removal-techniques-tp4372652p4372652.html
Sent from the R help mailing list archive at Nabble.com.