Dear list members, Can anyone tell me how the notches in boxplot(Y~X,notch=T) are calculated? What do these notches represent exactly? I?d suppose they are Conficence Intervals for the median, but I?ve also been told they might show Least Significant Difference (LSD) equivalents. I would very much appreciate any help from you. Best regards Chris.
On Mon, 1 Mar 2004, Christoph Scherber wrote:> Dear list members, > > Can anyone tell me how the notches in boxplot(Y~X,notch=T) are > calculated? What do these notches represent exactly? I?d suppose they > are Conficence Intervals for the median, but I?ve also been told they > might show Least Significant Difference (LSD) equivalents.The help page says that " If the notches of two plots do not overlap then the medians are significantly different at the 5 percent level." The only thing wrong with this is that it isn't true. The code says that the notches are +/- 1.58 IQR/sqrt(n), so I think the claimed confidence level holds only for normal distribuitons with small amounts of contamination. -thomas
On 1 Mar 2004 at 9:54, Thomas Lumley wrote:> On Mon, 1 Mar 2004, Christoph Scherber wrote: > > > Dear list members, > > > > Can anyone tell me how the notches in boxplot(Y~X,notch=T) are > > calculated? What do these notches represent exactly? I?d suppose > > they are Conficence Intervals for the median, but I?ve also been > > told they might show Least Significant Difference (LSD) equivalents. > > The help page says that " If the notches of two plots do not overlap > then the medians are significantly different at the 5 percent level." > > The only thing wrong with this is that it isn't true. The code says > that the notches are +/- 1.58 IQR/sqrt(n), so I think the claimed > confidence level holds only for normal distribuitons with small > amounts of contamination. >Couldn't this be replaced with confidence limits based on order statistics, which are nonparametrically correct, although they take some more to compute. Kjetil Halvorsen> > -thomas > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
> > >>I think John Tukey's idea was that this formula (or just the fact of >>> using median and quartiles) is still often approximately correct >>> for quite a few kinds of moderate contaminations... >> >> > >It may be approximately correct for the width of a CI (and when I checked >it was only appproximately correct for a normal), but I would seriously >doubt if it were approximately correct for a significance level of 5%. >Remember how fast the tails of the asymptotic normal distribution decay: a >20% error turns 5% into 2%. > >BTW, if there is a precise reference for this it would be good to add it >to boxplot.stats.Rd, as the confidence limits are unexplained there. > > >The factor 1.58 for H-spr/\sqrt{n} comes from the product of three approximations going from a 95% confidence interval for a difference in means, to one for a difference in medians, using the H-spr=IQR instead of the standard deviation: H-spr/1.349 \approx \sigma in a N(0,1) dist/n \sqrt{ \pi / 2} \approx std error of a median 1.7 / sqrt{n} is the average of 1.96 and 1.39=1.96/\sqrt{2}, factors for the standard error of the difference between two means, in the cases where one variance is tiny, and where both are equal. I believe this is explained in @Article{McGill-etal:78, author = "R. McGill and J. W. Tukey and W. Larsen", year = "1978", title = "Variations of Box Plots", journal = TAS, volume = "32", pages = "12--16", } -- Michael Friendly Email: friendly at yorku.ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA