Gordon Smyth
2005-Jul-22 04:55 UTC
[Rd] boxplot() defaults {was "boxplot in extreme cases"}
>[Rd] boxplot() defaults {was "boxplot in extreme cases"} >Martin Maechler maechler at stat.math.ethz.ch >Mon Nov 8 10:36:42 CET 2004 > > AndyL> Try: > > AndyL> x <- list(x1=rep(c(0,1,2),c(10,20,40)), > x2=rep(c(0,1,2),c(10,40,20))) > AndyL> boxplot(x, pars=list(medpch=20, medcex=3)) > > AndyL> (Cf ?bxp, pointed to from ?boxplot.) > >Good! Thank you, Andy. > >However, >this is not the first time it had crossed my mind that R's >default settings of drawing boxplot()s are not quite ok -- and >that's why I've diverted to R-devel. > >Keeping Tufte's considerations in mind, (and me not really wanting >to follow S-plus), shouldn't we consider to slightly change R's >boxplot()ing such that > > boxplot(list(x1=rep(c(0,1,2),c(10,20,40)), x2=rep(c(0,1,2),c(10,40,20)))) > >will *not* give too identically looking boxplots? >Also, the median should be emphasized more by default anyway. >{The lattice function bwplot() does it by only drawing a large > black ball as in Andy's example (and not drawing a line at all)} > >One possibility I'd see is to use a default 'medlwd = 3' >either in boxplot() or in bxp(.) and hence, what you currently get by > > boxplot(list(x1=rep(c(0,1,2),c(10,20,40)), x2=rep(c(0,1,2),c(10,40,20))), > medlwd=3) > >would become the default plotting in boxplot(). >Of course a smaller value "medlwd=2" would work too, but I'd >prefer a bit more (3). > >MartinHi Martin, I'm not sure this innovation (medlwd=3 default) is a good idea. Boxplots are designed to display many samples simultaneously on a graph, and it is important they be as clean and as simple as possible. To my eye, and to everyone in my lab, the thickened median line is rather distracting and makes the boxplots look more cluttered ("ugly" one of my postdocs said). The thickened line also goes against Tufte's principle of using minimum ink to represent the message. Yours and Erich's point about distinguishing the median==1st quartile case from the median==3rd quartile case is well taken. How about making medlwd=3 (or medlwd=2) the default behaviour only when the median coincides with one of the quartiles? That might satisfy everyone? I notice that there wasn't any follow up discusssion of this post of the r-devel list. Did this suggestion get any support? The boxplots have been so well accepted in their current form for many, many years, decades even, so one should be especially cautious of making changes without some sort of consensus. Best Gordon> > From: Erich Neuwirth > > > > I noticed the following: > > the 2 datasets > > rep(c(0,1,2),c(10,20,40)) and > > rep(c(0,1,2),c(10,40,20)) > > produce identical boxplots despite the fact that the medians are > > different. The reason is that the median in one case > > coincides with the > > first quartile, and in the second case with the third quartile. > > Is there a recommended way of displaying the median visibly in these > > cases? Setting notch=TRUE displays the median, but does look strange.--------------------------------------------------------------------------------------- Dr Gordon K Smyth, Senior Research Scientist, Bioinformatics, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3050, Australia Tel: (03) 9345 2326, Fax (03) 9347 0852, Email: smyth at wehi.edu.au, www: http://www.statsci.org