Evgeniy Kachalin
2005-Dec-01 16:04 UTC
[R] Impaired boxplot functionality - mean instead of median
Hello to all users and wizards. I am regulary using 'boxplot' function or its analogue - 'bwplot' from the 'lattice' library. But they are, as far as I understand, totally flawed in functionality: they miss ability to select what they would draw 'in the middle' - median, mean. What the box means - standard error, 90% or something else. What the whiskers mean - 100%, 99% or something else. Is there any way to realize it? Or is there any other good data visualization function for comparing means of various data groups? Ideally I would like to have a bit more customised function for doing that. For example, 'boxplot(a~b,data=d,mid='mean'). -- Evgeniy, ICQ 38317310.
Martin Maechler
2005-Dec-01 16:16 UTC
[R] Impaired boxplot functionality - mean instead of median
Boxplots were invented by John W. Tukey and I think should be counted among the top "small but smart" achievements from the 20th century. Very wisely he did *not* use mean and standard deviations. Even though it's possible to draw boxplots that are not boxplots (and people only recently explained how to do this with R on this mailing list), I'm arguing very strongly against this. If I see a boxplot - I'd want it to be a boxplot and not have the silly (please excuse) 10%--------90% whiskers which declare 20% of the points as outliers {in the boxplot sense}. If you want the mean +/- sd plot, do *not* misuse boxplots for them, please! Martin Maechler, ETH Zurich>>>>> "Evgeniy" == Evgeniy Kachalin <ka4alin at yandex.ru> >>>>> on Thu, 01 Dec 2005 19:04:47 +0300 writes:Evgeniy> Hello to all users and wizards. Evgeniy> I am regulary using 'boxplot' function or its analogue - 'bwplot' from Evgeniy> the 'lattice' library. [there's the lattice *package* !] Evgeniy> But they are, as far as I understand, totally Evgeniy> flawed in functionality: they miss ability to select what they would Evgeniy> draw 'in the middle' - median, mean. What the box means - standard Evgeniy> error, 90% or something else. What the whiskers mean - 100%, 99% or Evgeniy> something else. Evgeniy> Is there any way to realize it? Or is there any other good data Evgeniy> visualization function for comparing means of various data groups? Evgeniy> Ideally I would like to have a bit more customised function for doing Evgeniy> that. For example, 'boxplot(a~b,data=d,mid='mean'). Evgeniy> -- Evgeniy> Evgeniy, ICQ 38317310.
Jean-Christophe BOUETTE
2005-Dec-01 16:16 UTC
[R] Impaired boxplot functionality - mean instead of median
I'm no wizard but looking at ?boxplot I think you should try ?bxp. HTH, Jean-Christophe. 2005/12/1, Evgeniy Kachalin <ka4alin at yandex.ru>:> Hello to all users and wizards. > > I am regulary using 'boxplot' function or its analogue - 'bwplot' from > the 'lattice' library. But they are, as far as I understand, totally > flawed in functionality: they miss ability to select what they would > draw 'in the middle' - median, mean. What the box means - standard > error, 90% or something else. What the whiskers mean - 100%, 99% or > something else. > Is there any way to realize it? Or is there any other good data > visualization function for comparing means of various data groups? > Ideally I would like to have a bit more customised function for doing > that. For example, 'boxplot(a~b,data=d,mid='mean'). > > > -- > Evgeniy, ICQ 38317310. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Marc Schwartz (via MN)
2005-Dec-01 19:59 UTC
[R] Impaired boxplot functionality - mean instead of median
> Marc Schwartz (via MN) 佇伩佇伕佈垇衼祦褌: > > On Thu, 2005-12-01 at 19:40 +0300, Evgeniy Kachalin wrote: > > > >>Martin Maechler 佇伩佇伕佈垇衼祦褌: > > >>So I analize genetics data. I have some factor (gene variant, c(1,2,3)) > >>and the quantitative variable corresponding to that factor. How do I > >>visualize this situation? Compare mean of samples corresponding to > >>factor values? > >> > >>Should boxplot support 'mean-in-the-middle', it would fit my needs > >>ideally. How do I plot mean +/- SD plot? > >> > >>Also there is a way to rewrite boxplot.stats and replace "fivenum" there > >>for self-made function. Then I would need to write self-made > >>boxplot.formula (or boxplot.default?) function. And all this stuff would > >>not be configurable. I'm still novice in R, so I need simple way to > >>pre-visualize my data and estimate approximate result. > > > > > > If you want means and SDs, you might want to look at: > > > > 1. plotCI() and plotmeans() in the gplots package > > So plotmeans is incapable of: boxplot(numerical~fact1+fact2). Is there > any way further?I think that somehow we are talking past each other here. plotmeans() does what it is designed to do, which is to simplify the process of plotting group-wise point estimates and user defined error bars/intervals around the point estimates. In your case, these intervals would be standard deviations around each of the group means as you have indicated. Review the examples in ?plotmeans. As Martin and others have pointed out, you need to remove boxplots from the equation here, as they were not designed to plot means and standard deviations. HTH, Marc Schwartz
Wiener, Matthew
2005-Dec-01 20:58 UTC
[R] Impaired boxplot functionality - mean instead of median
interaction(A, B) will create a single factor made up of the combinations of the two factors A and B. Perhaps that would let you use plotmeans. Hope this helps, Matt Wiener -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Evgeniy Kachalin Sent: Thursday, December 01, 2005 3:37 PM To: r-help at stat.math.ethz.ch Subject: Re: [R] Impaired boxplot functionality - mean instead of median Marc Schwartz (via MN) ??????????:>>Marc Schwartz (via MN) ??????????:>>So plotmeans is incapable of: boxplot(numerical~fact1+fact2). Is there >>any way further? > > > I think that somehow we are talking past each other here. > > plotmeans() does what it is designed to do, which is to simplify the > process of plotting group-wise point estimates and user defined error > bars/intervals around the point estimates. > > In your case, these intervals would be standard deviations around each > of the group means as you have indicated. > > Review the examples in ?plotmeans. > > As Martin and others have pointed out, you need to remove boxplots from > the equation here, as they were not designed to plot means and standard > deviations. >Again, what I'm talking about: plotmeans is incapable of analyzing the formula. For example, I have two factors: A - a, b, c, and B - d, e, f. If i plot: boxplot(num~A+B) what do I get? Eight boxes: ad, ae, af, ba, be, bf, cd, ce, cf. If I plot: plotmeans(num~A+B) - what do I get? Nothing. Because plotmeans cannot combine two factors in various combination. Is there a simple way to do it? Anyway... That's wrong way, all what is neccessary is to have a boxplot with mean istead of median. Is there simple way to do it? Statistical software like Statistica 7.0 offers any possible combination of what "Boxplot" could mean. Is it possible to have only one modification to R's boxplot? Thank you for kind answers. Also please tell me, where should I send replies: to conference adress or to those who answer me directly. -- Evgeniy ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Marc Schwartz (via MN)
2005-Dec-01 21:30 UTC
[R] Impaired boxplot functionality - mean instead of median
On Thu, 2005-12-01 at 23:27 +0300, Evgeniy Kachalin wrote:> Marc Schwartz (via MN) 佇伩佇伕佈垇衼祦褌: > >>Marc Schwartz (via MN) 佇伩佇伕佈垇衼祦褌: > > >>So plotmeans is incapable of: boxplot(numerical~fact1+fact2). Is there > >>any way further? > > > > > > I think that somehow we are talking past each other here. > > > > plotmeans() does what it is designed to do, which is to simplify the > > process of plotting group-wise point estimates and user defined error > > bars/intervals around the point estimates. > > > > In your case, these intervals would be standard deviations around each > > of the group means as you have indicated. > > > > Review the examples in ?plotmeans. > > > > As Martin and others have pointed out, you need to remove boxplots from > > the equation here, as they were not designed to plot means and standard > > deviations. > > > > Again, what I'm talking about: plotmeans is incapable of analyzing the > formula. For example, I have two factors: A - a, b, c, and B - d, e, f. > > If i plot: boxplot(num~A+B) what do I get? Eight boxes: ad, ae, af, ba, > be, bf, cd, ce, cf. If I plot: plotmeans(num~A+B) - what do I get? > Nothing. Because plotmeans cannot combine two factors in various > combination. Is there a simple way to do it? > > Anyway... That's wrong way, all what is neccessary is to have a boxplot > with mean istead of median. Is there simple way to do it?If we take SDs out of the picture for the moment, we can do something like this: # Do the boxplot as you want using the formula boxplot(breaks ~ wool + tension, data = warpbreaks) # Get the means using tapply() with an interaction of the # factor levels for each group means <- with(warpbreaks, tapply(breaks, list(interaction(wool, tension)), mean, na.rm = TRUE)) # Now add the means to the boxplot, where the # x axis values are 1:number of groups by default points(1:length(means), means, pch = 19)> Statistical software like Statistica 7.0 offers any possible combination > of what "Boxplot" could mean. Is it possible to have only one > modification to R's boxplot? > > Thank you for kind answers. > Also please tell me, where should I send replies: to conference adress > or to those who answer me directly.Generally best to "reply to all", which gets the message back to the thread participants quickly as well as the list archive for use by others during searches. HTH, Marc Schwartz
Austin, Matt
2005-Dec-02 00:32 UTC
[R] Impaired boxplot functionality - mean instead of median
Check your syntax on the bwplot call. fa <- data.frame(doz=sample(500:2000, size=500), fabp2=rep(1:20, 25)) bwplot(factor(fabp2) ~ doz, data=fa, panel=panel.bpplot) fa.sum <- summarize( fa$doz, list( fabp2 = fa$fabp2), smean.sd, stat.name="doz") Dotplot( factor(fabp2) ~ Cbind(doz, doz - SD, doz + SD) , data=fa.sum) You can ignore the warning, I'm sure Dr. Harrell has already fixed that issue. --Matt> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Evgeniy Kachalin > Sent: Thursday, December 01, 2005 2:43 PM > To: Frank E Harrell Jr > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] Impaired boxplot functionality - mean > instead of median > > > Frank E Harrell Jr ??????????: > > Evgeniy Kachalin wrote: > > > >> Marc Schwartz (via MN) ??????????: > >> > >>>> Marc Schwartz (via MN) ??????????: > > >> > > > > library(Hmisc) > > library(lattice) > > ?panel.bpplot > > > > bwplot(...., panel=panel.bpplot) > > > > By default, panel.bpplot shows the mean (dot) and median > (line) plus > > several quantiles. To bother Martin in a friendly way, I > think that > > means can be useful additions - not that they are so useful by > > themselves, but that when they differ a lot from the median, > > non-statisticians gain further information about asymmetry. > Also, even > > though the simple box plot is elegant, I sometimes think it > has a high > > ink to information ratio. I have gained a lot from seeing outer > > quantiles on the plot, and I don't like to show outer > points for fear of > > someone labeling them outliers. For describing raw data > distributions, > > I never find standard deviations useful, however. > > > > => fa > doz fabp2 > 1 900 2 > 4 1500 2 > 6 1000 2 > 8 750 3 > 10 750 1 > 11 1750 2 > 12 500 3 > .... > .... > .... > .... > > > > bwplot(doz~factor(fabp2),data=fa,panel=panel.bpplot) > Error in sort(x, partial = unique(c(lo, hi))) : > unsupported options for partial sorting > > > That's NOT simple way. > > I need just one change. > Is there any good way? > $-( > > -- > Evgeniy > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Martin Maechler
2005-Dec-02 07:36 UTC
[R] Impaired boxplot functionality - mean instead of median
{diverted back to R-help} There are several R packages that provide plots of "mean +/- SD" (or "mean +/- 2*SD" which is an approximate 95% confidence interval for the case of normally distributed data) or so called "error bars". E.g. function plotCI() in package 'gplots' and errbar() in package 'Hmisc' or 'sfsmisc'. I'm very convinced that boxplots shouldn't be (mis!)used for drawing those (and they are not by the above functions). Regards, Martin>>>>> "Evgeniy" == Evgeniy Kachalin <ka4alin at yandex.ru> >>>>> on Thu, 01 Dec 2005 19:39:18 +0300 writes:Evgeniy> Martin Maechler ??????????: >> Boxplots were invented by John W. Tukey and I think should be >> counted among the top "small but smart" achievements from the >> 20th century. Very wisely he did *not* use mean and standard deviations. >> >> Even though it's possible to draw boxplots that are not boxplots >> (and people only recently explained how to do this with R on this >> mailing list), I'm arguing very strongly against this. >> >> If I see a boxplot - I'd want it to be a boxplot and not have >> the silly (please excuse) 10%--------90% whiskers which >> declare 20% of the points as outliers {in the boxplot sense}. >> >> If you want the mean +/- sd plot, do *not* misuse boxplots >> for them, please! >> Evgeniy> So I analize genetics data. I have some factor Evgeniy> (gene variant, c(1,2,3)) and the quantitative Evgeniy> variable corresponding to that factor. How do I Evgeniy> visualize this situation? Compare mean of samples Evgeniy> corresponding to factor values? Evgeniy> Should boxplot support 'mean-in-the-middle', it Evgeniy> would fit my needs ideally. How do I plot mean +/- Evgeniy> SD plot? Evgeniy> Also there is a way to rewrite boxplot.stats and Evgeniy> replace "fivenum" there for self-made Evgeniy> function. Then I would need to write self-made Evgeniy> boxplot.formula (or boxplot.default?) function. And Evgeniy> all this stuff would not be configurable. I'm still Evgeniy> novice in R, so I need simple way to pre-visualize Evgeniy> my data and estimate approximate result. yes, there are ways, but no, I pretty strongly oppose the idea to misuse the boxplot graphics for depicting very different identities.
Petr Pikal
2005-Dec-02 08:37 UTC
[R] Impaired boxplot functionality - mean instead of median
Hi I totally agree with Martin because when I see boxplot I immediately expect median in the middle and all other parts defined accordingly. It is possible to use bp <- boxplot(..., plot=F) and then to change the median values in bp to means and IQRs to SD and everything to anything else but this raise immediatelly the issue of "Lies, damned lies and statistics" Just my 2 cents. Petr On 2 Dec 2005 at 8:36, Martin Maechler wrote: From: Martin Maechler <maechler at stat.math.ethz.ch> Date sent: Fri, 2 Dec 2005 08:36:02 +0100 To: Evgeniy Kachalin <ka4alin at yandex.ru> Copies to: R-help at stat.math.ethz.ch Subject: Re: [R] Impaired boxplot functionality - mean instead of median Send reply to: Martin Maechler <maechler at stat.math.ethz.ch> <mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe> <mailto:r-help-request at stat.math.ethz.ch?subject=subscribe>> {diverted back to R-help} > > There are several R packages that provide plots of > "mean +/- SD" (or "mean +/- 2*SD" which is an approximate 95% > confidence interval for the case of normally distributed data) > or so called "error bars". > > E.g. function plotCI() in package 'gplots' and errbar() in > package 'Hmisc' or 'sfsmisc'. > > I'm very convinced that boxplots shouldn't be (mis!)used for > drawing those (and they are not by the above functions). > > Regards, > Martin > > >>>>> "Evgeniy" == Evgeniy Kachalin <ka4alin at yandex.ru> > >>>>> on Thu, 01 Dec 2005 19:39:18 +0300 writes: > > Evgeniy> Martin Maechler ??????????: > >> Boxplots were invented by John W. Tukey and I think should be > >> counted among the top "small but smart" achievements from the > >> 20th century. Very wisely he did *not* use mean and standard > deviations. >> >> Even though it's possible to draw boxplots that > are not boxplots >> (and people only recently explained how to do > this with R on this >> mailing list), I'm arguing very strongly > against this. >> >> If I see a boxplot - I'd want it to be a > boxplot and not have >> the silly (please excuse) 10%--------90% > whiskers which >> declare 20% of the points as outliers {in the > boxplot sense}. >> >> If you want the mean +/- sd plot, do *not* > misuse boxplots >> for them, please! >> > > Evgeniy> So I analize genetics data. I have some factor > Evgeniy> (gene variant, c(1,2,3)) and the quantitative > Evgeniy> variable corresponding to that factor. How do I > Evgeniy> visualize this situation? Compare mean of samples > Evgeniy> corresponding to factor values? > > Evgeniy> Should boxplot support 'mean-in-the-middle', it > Evgeniy> would fit my needs ideally. How do I plot mean +/- > Evgeniy> SD plot? > > Evgeniy> Also there is a way to rewrite boxplot.stats and > Evgeniy> replace "fivenum" there for self-made > Evgeniy> function. Then I would need to write self-made > Evgeniy> boxplot.formula (or boxplot.default?) function. And > Evgeniy> all this stuff would not be configurable. I'm still > Evgeniy> novice in R, so I need simple way to pre-visualize > Evgeniy> my data and estimate approximate result. > > yes, there are ways, but no, I pretty strongly oppose the idea > to misuse the boxplot graphics for depicting very different > identities. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz
Reasonably Related Threads
- Two factors -> nurical data dependency analyzing
- R-help Digest, Vol 36, Issue 21
- Boxplot 5% and 95% quantile instead of 25% and 75%
- Mass 'identify' on 2d-plot
- A two-part question about box-percentile plots, bpplot(): (1) yaxt="n" doesn't seem to work (2) how to display mean values