Displaying 20 results from an estimated 3000 matches similar to: "some thoughts on outlier detection, need help!"
2004 Jan 21
1
outlier identification: is there a redundancy-invariant substitution for mahalanobis distances?
Dear R-experts,
Searching the help archives I found a recommendation to do multivariate
outlier identification by mahalanobis distances based on a robustly estimated
covariance matrix and compare the resulting distances to a chi^2-distribution
with p (number of your variables) degrees of freedom. I understand that
compared to euclidean distances this has the advantage of being scale-invariant.
2011 Nov 16
2
outlier identify in qqplot
Dear Community,
I want to identify outliers in my data. I don't know how to use identify
command in the plots obtained.
I've gone through help files and use mahalanobis example for my purpose:
NormalMultivarianteComparefunc <- function(x) {
Sx <- cov(x)
D2 <- mahalanobis(x, colMeans(x), Sx)
plot(density(D2, bw=.5), main="Squared Mahalanobis distances, n=nrow(x),
2005 Aug 08
2
computationally singular
Hi,
I have a dataset which has around 138 variables and 30,000 cases. I am
trying to calculate a mahalanobis distance matrix for them and my
procedure is like this:
Suppose my data is stored in mymatrix
> S<-cov(mymatrix) # this is fine
> D<-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S))
Error in solve.default(cov, ...) : system is computationally
2000 Apr 21
1
outlier detection methods in r?
hi -
if I sample from a normal distribution with something like
n100<-rnorm(100,0,1)
and add an outlier with
n100[10]<-4
then
qqnorm(n100)
visually shows the point 4 as an outlier
and calculating the probablity of a value of 4 or bigger in 100 samples of norm(0,1)
gives
> 1-exp(log(pnorm(4,0,1))*100)
[1] 0.003162164
If I have more than 1 sample above outlier threshold the math is a
2005 Aug 08
2
selecting outliers
Hi everybody,
I'd like to know if there's an easy way for extracting
outliers record from a dataset, in order to perform
further analysis on them.
Thanks
Alessandro
2004 May 26
0
Outlier identification according to Hardin & Rocke (1999)
I'm trying to use a paper by Hardin & Rocke: http://handel.cipic.ucdavis.edu/~dmrocke/Robdist5.pdf
as a guide for a function to identify outliers in multivariate data. Attached below is a function that is my attempt to reproduce their method and also a test to see what fraction of the data are identified as outliers. Using this function I am able to reproduce their results regarding the
2009 Aug 05
0
get NA from outlier{randomForest}
Hi
I have a data frame like this:
V1 V2 V3 V4
Min. :0.01146 Min. :0.0006714 Min. :0.004912 Min. : 0
1st Qu.:0.03938 1st Qu.:0.0072805 1st Qu.:0.052719 1st Qu.:1150
Median :0.04224 Median :0.0077581 Median :0.056388 Median :1150
Mean :0.04010 Mean :0.0074669 Mean :0.052602 Mean :1173
3rd
2011 Sep 26
2
Mahalanobis Distance
Hello R helpers,
I'm trying to use Mahalanobis distance to calculate distance of two time
series, to make some comparations with euclidean distance, DTW, etc, but I'm
having some dificults.
I have, for example, two objects:
s.1 <- c( 5.6324702, 1.3994353, -3.2572327, -3.8311846, -1.2248719,
0.9894694, -2.2835332, -5.1969285, -5.2823988, -3.1499400, -1.7307950,
2.8221209,
2004 Mar 26
1
Mahalanobis
Dear all
Why isn'it possible to calculate Mahalanobis distances with R for a matrix
with 1 row (observations) more than the number of columns (variables)?
> mydata <- matrix(runif(12,-5,5), 4, 3)
> mahalanobis(x=mydata, center=apply(mydata,2,mean), cov=var(mydata))
[1] 2.25 2.25 2.25 2.25
> mydata <- matrix(runif(420,-5,5), 21, 20)
> mahalanobis(x=mydata,
2009 Jul 20
2
mahalanobis distance
http://www.nabble.com/file/p24569511/mahalanobis.txt mahalanobis.txt
http://www.nabble.com/file/p24569511/concentrations.txt concentrations.txt
Dear Forum members,
I have a problem calculating mahalanobis distances. My data file
mahalanobis.txt and categories file concentrations.txt are attached. I do
the following steps:
x <- as.matrix(read.table("mahalanobis.txt", header=TRUE))
2010 Jan 30
2
Questions on Mahalanobis Distance
Hello,
I am a new R user and trying to learn how to implement the mahalanobis
function to measure the distance between to 2 population centroids. I
have used STATISTICA to calculate these differences, but was hoping to learn
to do the analysis in R. I have implemented the code as below, but my
results are very different from that of STATISTICA, and I believe I may not
have interpreted the help
2011 Mar 22
1
Using the mahalanobis( ) function
Hello all,
I am a 2 month newbie to R and am stumped. I have a data set that I've run multivariate stats on using the manova function (I included the data set). Now it comes time for a table of effect sizes with significance. The univariate tests are easy. Where I run into trouble filling in the table of effect sizes is the Mahalanobis D as an effect size. I've included the table so
2011 Mar 20
1
Using the Mahalanobis Function
Hello all,
I am a 2 month newbie to R and am stumped. I have a data set that I've run multivariate stats on using the manova function (I included the data set). Now it comes time for a table of effect sizes with significance. The univariate tests are easy. Where I run into trouble filling in the table of effect sizes is the Mahalanobis D as an effect size. I've included the table so
2010 Jun 22
1
Mahalanobis distance
I am a new R user. i have a question about Mahalanobis distance.actually i have 300 rows and 7 columns. columns are different measurements, 300 rows are genes. since genes can
classify into 4 categories. i used dist() with euclidean distance and cmdscale to do MDS plot. but find out Mahalanobis distance may be
better. how do i use Mahalanobis() to generate similar dist object which i can use
2007 Feb 20
1
Mahalanobis distance and probability of group membership using Hotelling's T2 distribution
I want to calculate the probability that a group will include a particular
point using the squared Mahalanobis distance to the centroid. I understand
that the squared Mahalanobis distance is distributed as chi-squared but that
for a small number of random samples from a multivariate normal population
the Hotellings T2 (T squared) distribution should be used.
I cannot find a function for
2008 Oct 09
2
vectorization instead of using loop
Dear all,
I've sent this question 2 days ago and got response from Sarah. Thanks for
that. But unfortunately, it did not really solve our problem. The main issue
is that we want to use our own (manipulated) covariance matrix in the
calculation of the mahalanobis distance. Does anyone know how to vectorize
the below code instead of using a loop (which slows it down)?
I'd really appreciate
2005 Dec 14
1
About help on 'mahalanobis'
Hi,
help on 'mahalanobis' (in the stats package in Rv2.2.0) now says:
"Description:
Returns the Mahalanobis distance of all rows in 'x' and the vector
mu='center' with respect to Sigma='cov'. This is (for vector 'x')
defined as
D^2 = (x - mu)' Sigma^{-1} (x - mu)"
It does return D^2 as written. However,
2008 Dec 08
1
Clustering with Mahalanobis Distance
Dear R ExpeRts,
I'm having memory difficulties using mahalanobis distance to trying to cluster in R. I was wondering if anyone has done it with a matrix of 6525x17 (or something similar to that size). I have a matrix of 6525 genes and 17 samples. I have my R memory increased to the max and am still getting "cannot allocate vector of size" errors. My matrix "x" is
2005 Jun 24
1
Mahalanobis distances
Dear R community
Have just recently got back into R after a long break and have been amazed at
how much it has grown, and how active the list is! Thank you so much to all
those who contribute to this amazing project.
My question:
I am trying to calculate Mahalanobis distances for a matrix called "fgmatrix"
>dim(fgmatrix)
[1] 76 15
>fg.cov <- cov.wt(fgmatrix)
2005 Jul 06
1
Help: Mahalanobis distances between 'Species' from iris
Dear R list,
I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data
as obtained below:
Squared Distance to Species From Species:
Setosa Versicolor Virginica
Setosa 0 89.86419 179.38471
Versicolor 89.86419 0 17.20107
Virginica 179.38471 17.20107 0
This distances above were obtained with proc