>>>>> "PD" == Peter Dalgaard <p.dalgaard at
biostat.ku.dk>
>>>>>     on Sun, 12 Jul 2009 11:11:37 +0200 writes:
    PD> m.crawley at imperial.ac.uk wrote:
    >> In a Box and Whisker plot, I thought that when there are outliers
both abov    >> e and below the whiskers, then the whiskers should both be
the same length     >> (plus or minus 1.5 times the inter-quartile range).
    PD> Not according to the docs:
    PD> range: this determines how far the plot whiskers extend out from the
    PD> box.  If 'range' is positive, the whiskers extend to the most
    PD> extreme data point which is no more than 'range' times the
    PD> interquartile range from the box. A value of zero causes the
    PD> whiskers to extend to the data extremes.
    PD> And the code itself has
    PD> stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
    PD> So the whisker won't be equal to 1.5 IQR unless there happens to
be an
    PD> observation there.
    PD> Now, this might be wrong, but people have tried very hard to make the
    PD> implementation follow the original definition due to Tukey. I.e., if
you
    PD> can point out that Tukey specified it otherwise, then we'd change
it,
    PD> otherwise it is just not a bug.
I'd bet pretty large amounts that we (and S and S-plus probably
quite few otherpackages) have implemented the whiskers the way
JWT defined them, very purposefully.
One of JWT's point *was* exactly that most of the values "drawn"
represent *observations* (and those that do not use
exact mid points of obs.):
It's not by coincidence or even queerness that the box is *not*
delineated by the usual quartiles, but rather the *hinges*
[ Digression about hinges vs quartiles : 
   ?boxplot.stats
  has a section 'Details'  to which I had added such information about
  decade ago.
  Whereas our R help pages ( ?boxplot.stats,  ?fivenum ) 
  do use the correct definitions,
  unfortunately many other places do *not*, e.g., even the
  Wikipedia page  http://en.wikipedia.org/wiki/Five-number_summary
  wrongly talks about 1st and 3rd quartile,
  but then at least uses a numerical example using the hinges
]
Martin Maechler, ETH Zurich
    >> If you look at the plot for SilwoodWeather on p.155 of The R Book
you will     >> see that for November (month =3D 11) the upper whisker is
shorter than the     >> lower, while for other months with outliers both
above and below, the lines    >> are the same lengths.
    PD> For easier reproduction (reproducible examples should not refer to
files
    PD> on your C: drive...):
    >> diff(boxplot({set.seed(9);x<-rnorm(50)})$stats)
    PD> [,1]
    PD> [1,] 1.2525857
    PD> [2,] 0.5412128
    PD> [3,] 0.6083348
    PD> [4,] 1.4625057
    PD> -- 
    PD> O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
    PD> c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
    PD> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45)
35327918
    PD> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45)
35327907
    PD> ______________________________________________
    PD> R-devel at r-project.org mailing list
    PD> https://stat.ethz.ch/mailman/listinfo/r-devel