Karin Lagesen
2008-Jan-25 16:39 UTC
[R] accessing the indices of outliers in a data frame boxplot
I have a data frame containing columns which are factors. I use this to make boxplots for the data, with one box per factor. I would now like to get at the data in the data frame which corresponds to the outliers. I have so far found the $out, which gives "the values of any data points which lie beyond the extremes of the whiskers", but I haven't found anything which will let me get at the indices in the original data frame for these outliers. I think there might be a chance that I could simply compare the values I am plotting from my data frame with the values for the whiskers and use that as a criteria, but I am unsertain of how to do this withhout doing it manually. The factor I am plotting against contains 17 levels, and I'd thus like to see if there is a somewhat more general solution available. Thanks for your help! Karin -- Karin Lagesen, PhD student karin.lagesen at medisin.uio.no http://folk.uio.no/karinlag
Chuck Cleland
2008-Jan-25 17:01 UTC
[R] accessing the indices of outliers in a data frame boxplot
On 1/25/2008 11:39 AM, Karin Lagesen wrote:> I have a data frame containing columns which are factors. I use this > to make boxplots for the data, with one box per factor. I would now > like to get at the data in the data frame which corresponds to the > outliers. I have so far found the $out, which gives "the values of any > data points which lie beyond the extremes of the whiskers", but I > haven't found anything which will let me get at the indices in the > original data frame for these outliers. > > I think there might be a chance that I could simply compare the values > I am plotting from my data frame with the values for the whiskers and > use that as a criteria, but I am unsertain of how to do this withhout > doing it manually. The factor I am plotting against contains 17 > levels, and I'd thus like to see if there is a somewhat more general > solution available. > > Thanks for your help! > > KarinYou can use the %in% operator (is.element) to see which data values in your data frame match an outlier value. Then use which() to return the TRUE indices. For example: set.seed(245) df <- data.frame(GRP = rep(LETTERS[1:4], each=25), Y = rchisq(100, 2)) mybp <- boxplot(Y ~ GRP, data=df) which(df$Y %in% mybp$out) [1] 8 12 47 66 88 93 mybp$out [1] 5.919915 9.135578 5.723714 8.758584 8.502147 4.920513 df$Y[which(df$Y %in% mybp$out)] [1] 5.919915 9.135578 5.723714 8.758584 8.502147 4.920513 See ?is.element and ?which. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894