Displaying 20 results from an estimated 8000 matches similar to: "outlier"
2011 Dec 06
2
Why can't I figure this out? :S
Hi, so I don't speak computer and I have no idea what this code is telling
the program to do, but I apparently need to be able to find and isolate
influencial observations. Problem, I have no idea what the error means and
where it may be from in the code.
error I get is below the code
{
## OLS results
NameC<- lm(gpanew~female+female:lastinit+agenew+canadian+mom_ed+yearstudy)
## default:
2003 Feb 10
2
problems using lqs()
Dear List-members,
I found a strange behaviour in the lqs function.
Suppose I have the following data:
y <- c(7.6, 7.7, 4.3, 5.9, 5.0, 6.5, 8.3, 8.2, 13.2, 12.6, 10.4, 10.8,
13.1, 12.3, 10.4, 10.5, 7.7, 9.5, 12.0, 12.6, 13.6, 14.1, 13.5, 11.5,
12.0, 13.0, 14.1, 15.1)
x1 <- c(8.2, 7.6,, 4.6, 4.3, 5.9, 5.0, 6.5, 8.3, 10.1, 13.2, 12.6, 10.4,
10.8, 13.1, 13.3, 10.4, 10.5, 7.7, 10.0, 12.0,
2002 Nov 26
4
how to identify the outliers
Hello R-users,
Is there any more sophisticated way how to identify the dataset
outliers other then seeing them in boxplot? I wanna exclude them from
further analysis and I am interested in their position in my vector
data.
Rado
--
Radoslav Bonk M.S.
Dept. of Physical Geography and Geoecology
Faculty of Sciences, Comenius University
Mlynska Dolina 842 15, Bratislava, SLOVAKIA
tel: +421 2 602
2003 Jun 18
1
Ltsreg and nsamp="exact"
I'm trying to use least trimmed squares using ltsreg with nsamp="exact".
When I use the following:
rg <- ltsreg(x,y,nsamp="exact")
I get:
Error in lqs.default(x, y, nsamp = "exact", method = "lts") :
NAs in foreign function call (arg 10)
In addition: Warning message:
NAs introduced by coercion
Incidentally, there are no missings in x or y,
2005 Oct 06
0
a question about LMS and what constitutes outliers
Hi,
I have been using the lqs function with method='lms'. However the
results I get are a little different from the results noted by Rousseeuw
& Leroy (Robust Regression and Outlier Detection) and I was wondering
how to use these results for outlier detection.
I'm using the stackloss dataset, for which the original Rousseeuw et al.
program points out that observations 1,2,3,4
2011 Nov 16
2
outlier identify in qqplot
Dear Community,
I want to identify outliers in my data. I don't know how to use identify
command in the plots obtained.
I've gone through help files and use mahalanobis example for my purpose:
NormalMultivarianteComparefunc <- function(x) {
Sx <- cov(x)
D2 <- mahalanobis(x, colMeans(x), Sx)
plot(density(D2, bw=.5), main="Squared Mahalanobis distances, n=nrow(x),
2005 Aug 08
2
selecting outliers
Hi everybody,
I'd like to know if there's an easy way for extracting
outliers record from a dataset, in order to perform
further analysis on them.
Thanks
Alessandro
2006 Jan 25
2
how to test robustness of correlation
Hi, there:
As you all know, correlation is not a very robust procedure. Sometimes
correlation could be driven by a few outliers. There are a few ways to
improve the robustness of correlation (pearson correlation), either by
outlier removal procedure, or resampling technique.
I am wondering if there is any R package or R code that have incorporated
outlier removal or resampling procedure in
2004 Jun 06
3
Average R-squared of model1 to model n
Hi,
We got a question about interpretating R-suqared.
The actual outputs for a test dataset is X=(x1,x2, ..., xn).
model 1 predicted the outputs as Y1=(y11,y12,..., y1n)
model n predicted the outputs as Y2=(y21,y22,..., y2n)
...
model m predicted the outputs as Ym=(ym1,ym2,..., ymn)
Now we have two ways to calculate R squared to evaluate the average performance of committee model.
(a)
2004 Jan 21
1
outlier identification: is there a redundancy-invariant substitution for mahalanobis distances?
Dear R-experts,
Searching the help archives I found a recommendation to do multivariate
outlier identification by mahalanobis distances based on a robustly estimated
covariance matrix and compare the resulting distances to a chi^2-distribution
with p (number of your variables) degrees of freedom. I understand that
compared to euclidean distances this has the advantage of being scale-invariant.
2003 Aug 26
2
Simple simulation in R
Hello all
I have a feeling this is very simple......but I am not sure how to do
it
My boss has two variables, one is an average of 4 numbers, the other is
an average of 3 of those numbers i.e
var1 = (X1 + X2 + X3 + X4)/4
var2 = (X1 + X2 + X3)/3
all of the X variables are supposed to be measuring similar constructs
not surprisingly, these are highly correlated (r = .98), the question
is how
2004 May 26
0
Outlier identification according to Hardin & Rocke (1999)
I'm trying to use a paper by Hardin & Rocke: http://handel.cipic.ucdavis.edu/~dmrocke/Robdist5.pdf
as a guide for a function to identify outliers in multivariate data. Attached below is a function that is my attempt to reproduce their method and also a test to see what fraction of the data are identified as outliers. Using this function I am able to reproduce their results regarding the
1999 Feb 09
1
Robust estimate of variance
Has anybody written or located a robust verion of Var(X)?
______________________________________________________
Get Your Private, Free Email at http://www.hotmail.com
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in
2010 Nov 30
3
Outlier statistics question
I have a statistical question.
The data sets I am working with are right-skewed so I have been
plotting the log transformations of my data. I am using a Grubbs Test
to detect outliers in the data, but I get different outcomes depending
on whether I run the test on the original data or the log(data). Here
is one of the problematic sets:
fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047)
2012 Feb 09
1
Outlier removal techniques
Hello,
I need to analyse a data matrix with dimensions of 30x100.
Before analysing the data there is, however, a need to remove outliers from
the data.
I read quite a lot about outlier removal already and I think the most common
technique for that seems to be Principal Component Analysis (PCA). However,
I think that these technqiue is quite subjective. When is an outlier an
outlier?
I uploaded
2004 Jul 16
3
Email eller vedhæftet fil blokeret
Email eller vedhæftet fil afsendt fra din adresse (eller med din adresse som afsender) er blevet afvist fra Allerød Kommune.
Spam og virus bliver typisk sendt under dække af andre afsendere og den blokerede email behøver derfor ikke oprinde direkte fra dig. (Husk dog altid at have et opdateret antivirusprogram på din computer.)
Du kan evt. scanne din computer med det gratis' værktøj
2009 May 26
2
(OT) Does pearson correlation assume bivariate normality of the data?
Dear all,
The other day I was reading this post [1] that slightly surprised me:
"To reject the null of no correlation, an hypothsis test based on the
normal distribution. If normality is not the base assumption your
working from then p-values, significance tests and conf. intervals
dont mean much (the value of the coefficient is not reliable) " (BOB
SAMOHYL).
To me this implied that in
2004 Jun 30
1
outlier tests
I have been learning about some outlier tests -- Dixon
and Grubb, specifically -- for small data sets. When
I try help.start() and search for outlier tests, the
only response I manage to find is the Bonferroni test
avaiable from the CAR package... are there any other
packages the offer outlier tests? Are the Dixon and
Grubb tests "good" for small samples or are others
more
2009 Feb 14
2
implementing Grubbs outlier test on a large dataframe
Hi!
I'm trying to implement an outlier test once/row in a large dataframe.
Ideally, I'd do this then add the Pvalue results and the number flagged as
an outlier as two new separate columns to the dataframe. Grubbs outlier
test requires a vector and I'm confused how to make each row of my dataframe
a vector, followed by doing a Grubbs test for each row containing the vector
of numbers
2008 Jun 18
2
randomForest outlier
I try to use ?randomForest to find variables that are the most important to
divide my dataset (continuous, categorical variables) in two given groups.
But when I plot the outliers:
plot(outlier(FemMalSex_NAavoid88.rf33, cls=FemMalSex_NAavoid88$Sex),
type="h",col=c("red","green")[as.numeric(FemMalSex_NAavoid88$Sex)])
it seems to me that all my values appear as