Suppose I am reading data from a file and the data contains some outliers. I want to know if it is possible in R to automatically detect outliers in a dataset and remove them -- View this message in context: http://n4.nabble.com/How-to-detect-and-exclude-outliers-in-R-tp1017285p1017285.html Sent from the R help mailing list archive at Nabble.com.
> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of vikrant > Sent: Monday, January 18, 2010 10:09 PM > To: r-help at r-project.org > Subject: [R] How to detect and exclude outliers in R? > > > Suppose I am reading data from a file and the data contains some outliers. > I > want to know if it is possible in R to automatically detect outliers in a > dataset and remove them > --You will need to provide more information. What is your definition of an outlier? And, why should those data be removed? Daniel Nordlund Bothell, WA USA
What makes an outlier an outlier depends on the model. A highly discrepant observation under one model is entirely typical under another. Even given a model, criteria for what consititutes an outlier vary by application area and user. Even given all of that, exclusion is only one of many possible actions. Can you be more specific about your model for the data? -- View this message in context: http://n4.nabble.com/How-to-detect-and-exclude-outliers-in-R-tp1017285p1017316.html Sent from the R help mailing list archive at Nabble.com.
Hi V.S., Did you search first on r-repositories about this issue prior to ask? May be not. RSiteSearch("outliers") bests milton On Tue, Jan 19, 2010 at 1:08 AM, vikrant <vikrant.shimpi@tcs.com> wrote:> > Suppose I am reading data from a file and the data contains some outliers. > I > want to know if it is possible in R to automatically detect outliers in a > dataset and remove them > -- > View this message in context: > http://n4.nabble.com/How-to-detect-and-exclude-outliers-in-R-tp1017285p1017285.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
fortune("outlier") vikrant schrieb:> Suppose I am reading data from a file and the data contains some outliers. I > want to know if it is possible in R to automatically detect outliers in a > dataset and remove them >-- Eik Vettorazzi Institut f?r Medizinische Biometrie und Epidemiologie Universit?tsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790
I had a similar problem. In my case, I had a large table of data and wanted to find and exclude a single huge value in one column (i.e. remove the entire row). There were thousands of rows of data, and this single value was more than 3x the next value, and at least 30x the typical value. I wanted to see what the effect of removing that one datapoint was, without having to change the underlying data. This finds & removes that one value. I assume it could be repeated to get rid of more values based on pre-defined criteria: First, load the "outliers" package. outlier_tf = outlier(data_full$target column,logical=TRUE) #This gives an array with all values False, except for the outlier (as defined in the package documentation "Finds value with largest difference between it and sample mean, which can be an outlier"). That value is returned as True. find_outlier = which(outlier_tf==TRUE,arr.ind=TRUE) #This finds the location of the outlier by finding that "True" value within the "outlier_tf" array. data_new = data_full[-find_outlier,] #This creates a new dataset based on the old data, removing the one row that contains the outlier Guy vikrant wrote:> > Suppose I am reading data from a file and the data contains some outliers. > I want to know if it is possible in R to automatically detect outliers in > a dataset and remove them >-- View this message in context: http://n4.nabble.com/How-to-detect-and-exclude-outliers-in-R-tp1017285p1559883.html Sent from the R help mailing list archive at Nabble.com.