similar to: outlier

Displaying 20 results from an estimated 8000 matches similar to: "outlier"

2011 Dec 06
2
Why can't I figure this out? :S
Hi, so I don't speak computer and I have no idea what this code is telling the program to do, but I apparently need to be able to find and isolate influencial observations. Problem, I have no idea what the error means and where it may be from in the code. error I get is below the code { ## OLS results NameC<- lm(gpanew~female+female:lastinit+agenew+canadian+mom_ed+yearstudy) ## default:
2003 Feb 10
2
problems using lqs()
Dear List-members, I found a strange behaviour in the lqs function. Suppose I have the following data: y <- c(7.6, 7.7, 4.3, 5.9, 5.0, 6.5, 8.3, 8.2, 13.2, 12.6, 10.4, 10.8, 13.1, 12.3, 10.4, 10.5, 7.7, 9.5, 12.0, 12.6, 13.6, 14.1, 13.5, 11.5, 12.0, 13.0, 14.1, 15.1) x1 <- c(8.2, 7.6,, 4.6, 4.3, 5.9, 5.0, 6.5, 8.3, 10.1, 13.2, 12.6, 10.4, 10.8, 13.1, 13.3, 10.4, 10.5, 7.7, 10.0, 12.0,
2002 Nov 26
4
how to identify the outliers
Hello R-users, Is there any more sophisticated way how to identify the dataset outliers other then seeing them in boxplot? I wanna exclude them from further analysis and I am interested in their position in my vector data. Rado -- Radoslav Bonk M.S. Dept. of Physical Geography and Geoecology Faculty of Sciences, Comenius University Mlynska Dolina 842 15, Bratislava, SLOVAKIA tel: +421 2 602
2003 Jun 18
1
Ltsreg and nsamp="exact"
I'm trying to use least trimmed squares using ltsreg with nsamp="exact". When I use the following: rg <- ltsreg(x,y,nsamp="exact") I get: Error in lqs.default(x, y, nsamp = "exact", method = "lts") : NAs in foreign function call (arg 10) In addition: Warning message: NAs introduced by coercion Incidentally, there are no missings in x or y,
2005 Oct 06
0
a question about LMS and what constitutes outliers
Hi, I have been using the lqs function with method='lms'. However the results I get are a little different from the results noted by Rousseeuw & Leroy (Robust Regression and Outlier Detection) and I was wondering how to use these results for outlier detection. I'm using the stackloss dataset, for which the original Rousseeuw et al. program points out that observations 1,2,3,4
2011 Nov 16
2
outlier identify in qqplot
Dear Community, I want to identify outliers in my data. I don't know how to use identify command in the plots obtained. I've gone through help files and use mahalanobis example for my purpose: NormalMultivarianteComparefunc <- function(x) { Sx <- cov(x) D2 <- mahalanobis(x, colMeans(x), Sx) plot(density(D2, bw=.5), main="Squared Mahalanobis distances, n=nrow(x),
2005 Aug 08
2
selecting outliers
Hi everybody, I'd like to know if there's an easy way for extracting outliers record from a dataset, in order to perform further analysis on them. Thanks Alessandro
2006 Jan 25
2
how to test robustness of correlation
Hi, there: As you all know, correlation is not a very robust procedure. Sometimes correlation could be driven by a few outliers. There are a few ways to improve the robustness of correlation (pearson correlation), either by outlier removal procedure, or resampling technique. I am wondering if there is any R package or R code that have incorporated outlier removal or resampling procedure in
2004 Jun 06
3
Average R-squared of model1 to model n
Hi, We got a question about interpretating R-suqared. The actual outputs for a test dataset is X=(x1,x2, ..., xn). model 1 predicted the outputs as Y1=(y11,y12,..., y1n) model n predicted the outputs as Y2=(y21,y22,..., y2n) ... model m predicted the outputs as Ym=(ym1,ym2,..., ymn) Now we have two ways to calculate R squared to evaluate the average performance of committee model. (a)
2004 Jan 21
1
outlier identification: is there a redundancy-invariant substitution for mahalanobis distances?
Dear R-experts, Searching the help archives I found a recommendation to do multivariate outlier identification by mahalanobis distances based on a robustly estimated covariance matrix and compare the resulting distances to a chi^2-distribution with p (number of your variables) degrees of freedom. I understand that compared to euclidean distances this has the advantage of being scale-invariant.
2003 Aug 26
2
Simple simulation in R
Hello all I have a feeling this is very simple......but I am not sure how to do it My boss has two variables, one is an average of 4 numbers, the other is an average of 3 of those numbers i.e var1 = (X1 + X2 + X3 + X4)/4 var2 = (X1 + X2 + X3)/3 all of the X variables are supposed to be measuring similar constructs not surprisingly, these are highly correlated (r = .98), the question is how
2004 May 26
0
Outlier identification according to Hardin & Rocke (1999)
I'm trying to use a paper by Hardin & Rocke: http://handel.cipic.ucdavis.edu/~dmrocke/Robdist5.pdf as a guide for a function to identify outliers in multivariate data. Attached below is a function that is my attempt to reproduce their method and also a test to see what fraction of the data are identified as outliers. Using this function I am able to reproduce their results regarding the
1999 Feb 09
1
Robust estimate of variance
Has anybody written or located a robust verion of Var(X)? ______________________________________________________ Get Your Private, Free Email at http://www.hotmail.com -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in
2010 Nov 30
3
Outlier statistics question
I have a statistical question. The data sets I am working with are right-skewed so I have been plotting the log transformations of my data. I am using a Grubbs Test to detect outliers in the data, but I get different outcomes depending on whether I run the test on the original data or the log(data). Here is one of the problematic sets: fgf2p50=c(1.563,2.161,2.529,2.726,2.442,5.047)
2012 Feb 09
1
Outlier removal techniques
Hello, I need to analyse a data matrix with dimensions of 30x100. Before analysing the data there is, however, a need to remove outliers from the data. I read quite a lot about outlier removal already and I think the most common technique for that seems to be Principal Component Analysis (PCA). However, I think that these technqiue is quite subjective. When is an outlier an outlier? I uploaded
2004 Jul 16
3
Email eller vedhæftet fil blokeret
Email eller vedhæftet fil afsendt fra din adresse (eller med din adresse som afsender) er blevet afvist fra Allerød Kommune. Spam og virus bliver typisk sendt under dække af andre afsendere og den blokerede email behøver derfor ikke oprinde direkte fra dig. (Husk dog altid at have et opdateret antivirusprogram på din computer.) Du kan evt. scanne din computer med det gratis' værktøj
2009 May 26
2
(OT) Does pearson correlation assume bivariate normality of the data?
Dear all, The other day I was reading this post [1] that slightly surprised me: "To reject the null of no correlation, an hypothsis test based on the normal distribution. If normality is not the base assumption your working from then p-values, significance tests and conf. intervals dont mean much (the value of the coefficient is not reliable) " (BOB SAMOHYL). To me this implied that in
2004 Jun 30
1
outlier tests
I have been learning about some outlier tests -- Dixon and Grubb, specifically -- for small data sets. When I try help.start() and search for outlier tests, the only response I manage to find is the Bonferroni test avaiable from the CAR package... are there any other packages the offer outlier tests? Are the Dixon and Grubb tests "good" for small samples or are others more
2009 Feb 14
2
implementing Grubbs outlier test on a large dataframe
Hi! I'm trying to implement an outlier test once/row in a large dataframe. Ideally, I'd do this then add the Pvalue results and the number flagged as an outlier as two new separate columns to the dataframe. Grubbs outlier test requires a vector and I'm confused how to make each row of my dataframe a vector, followed by doing a Grubbs test for each row containing the vector of numbers
2008 Jun 18
2
randomForest outlier
I try to use ?randomForest to find variables that are the most important to divide my dataset (continuous, categorical variables) in two given groups. But when I plot the outliers: plot(outlier(FemMalSex_NAavoid88.rf33, cls=FemMalSex_NAavoid88$Sex), type="h",col=c("red","green")[as.numeric(FemMalSex_NAavoid88$Sex)]) it seems to me that all my values appear as