Tal Galili
2011-Jan-26 23:04 UTC
[R] boxplot - code for labeling outliers - any suggestions for improvements?
Hello all, I wrote a small function to add labels for outliers in a boxplot. This function will only work on a simple boxplot/formula command (e.g: something like boxplot(y~x)). Code + example follows in this e-mail. I'd be happy for any suggestions on how to improve this code, for example: - Handle boxplot.matrix (which shouldn't be too hard to do) - Handle cases of complex functions (e.g: boxplot(y~a*b)) - Handle cases where there are many outliers leading to a clutter of text (to this I have no idea how to systematically solve) Best, Tal ------------------------------ # the function boxplot.add.outlier.text <- function(DATA, x_name, y_name, label_name) { boxplot.outlier.data <- function(xx, y_name) { y <- xx[,y_name] boxplot_range <- range(boxplot.stats(y)$stats) ss <- (y < boxplot_range[1]) | (y > boxplot_range[2]) return(xx[ss,]) } require(plyr) txt_to_run <- paste("ddply(DATA, .(",x_name,"), boxplot.outlier.data, y_name = y_name)", sep = "") ourlier_df <- eval(parse(text = txt_to_run)) # head(ourlier_df) txt_to_run <- paste("formula(",y_name,"~", x_name,")") formu <- eval(parse(text = txt_to_run)) boxdata <- boxplot(formu , data = DATA, plot = F) boxdata_group_name <- boxdata$names[boxdata$group] boxdata_outlier_df <- data.frame(group = boxdata_group_name, y boxdata$out, x = boxdata$group) for(i in seq_len(dim(boxdata_outlier_df)[1])) { ss <- (ourlier_df[,x_name] %in% boxdata_outlier_df[i,]$group) & (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y) current_label <- ourlier_df[ss,label_name] temp_x <- boxdata_outlier_df[i,"x"] temp_y <- boxdata_outlier_df[i,"y"] text(temp_x, temp_y, current_label,pos=4) } list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df) } # example: boxplot(decrease ~ treatment, data = OrchardSprays, log = "y", col "bisque") boxplot.add.outlier.text(OrchardSprays, "treatment", "decrease", "colpos") ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]]
Greg Snow
2011-Jan-26 23:09 UTC
[R] boxplot - code for labeling outliers - any suggestions for improvements?
For the last point (cluttered text), look at spread.labels in the plotrix package and spread.labs in the TeachingDemos package (I favor the later, but could be slightly biased as well). Doing more than what those 2 functions do becomes really complicated really fast. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Tal Galili > Sent: Wednesday, January 26, 2011 4:05 PM > To: r-help at r-project.org > Subject: [R] boxplot - code for labeling outliers - any suggestions for > improvements? > > Hello all, > I wrote a small function to add labels for outliers in a boxplot. > This function will only work on a simple boxplot/formula command (e.g: > something like boxplot(y~x)). > > Code + example follows in this e-mail. > > I'd be happy for any suggestions on how to improve this code, for > example: > > - Handle boxplot.matrix (which shouldn't be too hard to do) > - Handle cases of complex functions (e.g: boxplot(y~a*b)) > - Handle cases where there are many outliers leading to a clutter of > text > (to this I have no idea how to systematically solve) > > > Best, > Tal > ------------------------------ > > > # the function > boxplot.add.outlier.text <- function(DATA, x_name, y_name, label_name) > { > > > boxplot.outlier.data <- function(xx, y_name) > { > y <- xx[,y_name] > boxplot_range <- range(boxplot.stats(y)$stats) > ss <- (y < boxplot_range[1]) | (y > boxplot_range[2]) > return(xx[ss,]) > } > > require(plyr) > txt_to_run <- paste("ddply(DATA, .(",x_name,"), boxplot.outlier.data, > y_name > = y_name)", sep = "") > ourlier_df <- eval(parse(text = txt_to_run)) > # head(ourlier_df) > txt_to_run <- paste("formula(",y_name,"~", x_name,")") > formu <- eval(parse(text = txt_to_run)) > boxdata <- boxplot(formu , data = DATA, plot = F) > boxdata_group_name <- boxdata$names[boxdata$group] > boxdata_outlier_df <- data.frame(group = boxdata_group_name, y > boxdata$out, x = boxdata$group) > for(i in seq_len(dim(boxdata_outlier_df)[1])) > { > ss <- (ourlier_df[,x_name] %in% boxdata_outlier_df[i,]$group) & > (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y) > current_label <- ourlier_df[ss,label_name] > temp_x <- boxdata_outlier_df[i,"x"] > temp_y <- boxdata_outlier_df[i,"y"] > text(temp_x, temp_y, current_label,pos=4) > } > > list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df) > } > > # example: > boxplot(decrease ~ treatment, data = OrchardSprays, log = "y", col > "bisque") > boxplot.add.outlier.text(OrchardSprays, "treatment", "decrease", > "colpos") > > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) > | > www.r-statistics.com (English) > ----------------------------------------------------------------------- > ----------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.