Displaying 20 results from an estimated 3000 matches similar to: "selecting outliers"
2002 Nov 26
4
how to identify the outliers
Hello R-users,
Is there any more sophisticated way how to identify the dataset
outliers other then seeing them in boxplot? I wanna exclude them from
further analysis and I am interested in their position in my vector
data.
Rado
--
Radoslav Bonk M.S.
Dept. of Physical Geography and Geoecology
Faculty of Sciences, Comenius University
Mlynska Dolina 842 15, Bratislava, SLOVAKIA
tel: +421 2 602
2005 Aug 08
2
computationally singular
Hi,
I have a dataset which has around 138 variables and 30,000 cases. I am
trying to calculate a mahalanobis distance matrix for them and my
procedure is like this:
Suppose my data is stored in mymatrix
> S<-cov(mymatrix) # this is fine
> D<-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S))
Error in solve.default(cov, ...) : system is computationally
2011 Sep 28
1
removing outliers in non-normal distributions
Hello,
I'm seeking ideas on how to remove outliers from a non-normal distribution
predictor variable. We wish to reset points deemed outliers to a truncated
value that is less extreme. (I've seen many posts requesting outlier removal
systems. It seems like most of the replies center around "why do you want to
remove them", "you shouldn't remove them", "it
2008 Jun 13
3
cluster.stats
Dear list,
I just tried to use the function cluster.stat in the package fpc.
I just have a couple of questions about the syntax:
cluster.stats(d,clustering,alt.clustering=NULL,
silhouette=TRUE,G2=FALSE,G3=FALSE)
1) the distance object (d) is an object obtained by the function dist() on
my own original matrix?
2) clustering is the clusters vector as result of one of the many clustering
methods?
2004 Jan 21
1
outlier identification: is there a redundancy-invariant substitution for mahalanobis distances?
Dear R-experts,
Searching the help archives I found a recommendation to do multivariate
outlier identification by mahalanobis distances based on a robustly estimated
covariance matrix and compare the resulting distances to a chi^2-distribution
with p (number of your variables) degrees of freedom. I understand that
compared to euclidean distances this has the advantage of being scale-invariant.
2005 Sep 29
5
Regression slope confidence interval
Hi list,
is there any direct way to obtain confidence intervals for the regression
slope from lm, predict.lm or the like?
(If not, is there any reason? This is also missing in some other statistics
softwares, and I thought this would be quite a standard application.)
I know that it's easy to implement but it's for
explanation to people who faint if they have to do their own
programming...
2003 Feb 20
3
outliers/interval data extraction
Dear R-users,
I have two outliers related questions.
I.
I have a vector consisting of 69 values.
mean = 0.00086
SD = 0.02152
The shape of EDA graphics (boxplots, density plots) is heavily distorted
due to outliers. How to define the interval for outliers exception? Is
<2SD - mean + 2SD> interval a correct approach?
Or should I define 95% (or 99%) limit of agreement for data interval,
2004 Sep 23
6
detection of outliers
Hi,
this is both a statistical and a R question...
what would the best way / test to detect an outlier value among a series of 10 to 30 values ? for instance if we have the following dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to identify the last data as an outlier (only one direction). One way would be to calculate abs(mean - median) and if elevated (to what extent ?) delete the
2010 Apr 24
4
DICE Coefficient of similarity measure
Hi,
I wanted the DICE coefficient (similarity measure for binary variables)
to be calculated in R and found that the "igraph" package has the option
of "similarity.dice" to do this. But, for this command, the input object
should be an igraph object. But, I have a dataframe of columns
containing 1's and 0's. Can I convert this dataframe into an igraph
object, so that
2005 Jul 08
5
Help with Mahalanobis
Dear R list,
I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data
as obtained below:
Squared Distance to Species From Species:
Setosa Versicolor Virginica
Setosa 0 89.86419 179.38471
Versicolor 89.86419 0 17.20107
Virginica 179.38471 17.20107 0
These distances were obtained with proc 'CANDISC'
2011 Jun 09
1
k-nn hierarchical clustering
Hi there,
is there any R-function for k-nearest neighbour agglomerative hierarchical
clustering?
By this I mean standard agglomerative hierarchical clustering as in hclust
or agnes, but with the k-nearest neighbour distance between clusters used
on the higher levels where there are at least k>1 distances between two
clusters (single linkage is 1-nearest neighbour clustering)?
Best regards,
2006 Aug 09
2
R CMD check error
Dear list,
R CMD check on my updated package now generated the following error:
"LaTeX errors when creating DVI version.
This typically indicates Rd problems."
But the Rd files (and everything else) were checked as "OK" (I
removed the problem about which I asked the list some hours ago, but
answers are still appreciated because I rather created a rough
workaround than
2010 Sep 01
2
Rd-file error: non-ASCII input and no declared encoding
Dear list,
I came across the following error for three of my newly written Rd-files:
non-ASCII input and no declared encoding
I can't make sense of this.
Below I copied in one of the three files.
Can anybody please tell me what's wrong with it?
Thank you,
Christian
\name{tetragonula}
\alias{tetragonula}
\alias{tetragonula.coord}
\docType{data}
% \non_function{}
\title{Microsatellite
2003 Aug 11
2
cluster analysis
I'like to do cluster analysis by using mahalanobis distance.
Could you tell me how to do?
2000 Apr 21
1
outlier detection methods in r?
hi -
if I sample from a normal distribution with something like
n100<-rnorm(100,0,1)
and add an outlier with
n100[10]<-4
then
qqnorm(n100)
visually shows the point 4 as an outlier
and calculating the probablity of a value of 4 or bigger in 100 samples of norm(0,1)
gives
> 1-exp(log(pnorm(4,0,1))*100)
[1] 0.003162164
If I have more than 1 sample above outlier threshold the math is a
2010 Oct 10
1
Package "prabclus" not available?
Hi there,
I just tried to install the package prabclus on a computer running Ubuntu
Linux 9.04 using install.packages from within R.
This gave me a message:
Warning message:
In install.packages("prabclus") : package ?prabclus? is not available
I tried to do this selecting two different CRAN mirrors (same result) and
with other packages (installing them works fine).
Looking up the
2005 Feb 25
2
outlier threshold
For the analysis of financial data wih a large variance, what is the best way to select an outlier threshold?
Listed below, is there a best method to select an outlier threshold and how does R calculate it?
In R, how do you find the outlier threshold through an interquartile range?
In R, how do you find the outlier threshold using the hist command?
In R, how do you find the outlier threshold
2011 Nov 16
2
outlier identify in qqplot
Dear Community,
I want to identify outliers in my data. I don't know how to use identify
command in the plots obtained.
I've gone through help files and use mahalanobis example for my purpose:
NormalMultivarianteComparefunc <- function(x) {
Sx <- cov(x)
D2 <- mahalanobis(x, colMeans(x), Sx)
plot(density(D2, bw=.5), main="Squared Mahalanobis distances, n=nrow(x),
2006 Aug 18
2
R-update - what about packages and ESS?
Hi there,
it seems that if I update R, it doesn't find previously installed packages
anymore and is also not found by ESS.
Actually the update has been done by our system administrator who assumed
that there would be no problems with these things (I don't have root
access to this system) and will perhaps not be too keen on installing
everything else again.
Is there any simple way how
2009 Jan 26
2
Help with clustering
I am going to try out a tentative clustering of some feature vectors.
The range of values spanned by the three items making up the features vector is quite different:
Item-1 goes roughly from 70 to 525 (integer numbers only)
Item-2 is in-between 0 and 1 (all real numbers between 0 and 1)
Item-3 goes from 1 to 10 (integer numbers only)
In order to spread out Item-2 even further I might try to