Hi there, I have a set of data that looks like this: As1988<-c(1254.0, 22.0, 4.2, 1081.0, 35.0, 6.0, 1772.0, 192.0, 7.6) The mean of this (as calculated by R) is: 485.9778 The median of this (as calculated by R) is: 35 If I then make a beanplot(As1988), I find that the beanline (average) is now 77.68561 while the beanline (median) is 35.39739 (using the locator function to check the graph and log axis). While I can understand the small discrepancy of the median(mouse hovering over the line), I am at a loss to explain the huge difference between the means. Is this a flaw in the package or is there something I am missing? My gut feeling is that the log scales are affecting the calculations somehow. Sincerely, Michael Hopgood [[alternative HTML version deleted]]
Hi, The log of the mean is not the same as the mean of the logs, that's a no-brainer. Guess you use the beanplot from the package with the same name.> beanplot(As1988,log="")gives the correct plot. Next time, could you provide a minimal code example we can run ourselves? If we don't know what functions you're using, it's quite impossible to say where the problem lies. Cheers Joris On Tue, Nov 24, 2009 at 12:52 PM, Michael Hopgood <michael.hopgood at mrm.se> wrote:> Hi there, > > I have a set of data that looks like this: > > As1988<-c(1254.0, 22.0, 4.2, 1081.0, 35.0, 6.0, 1772.0, ?192.0, ?7.6) > > The mean of this (as calculated by R) is: 485.9778 > > The median of this (as calculated by R) is: ?35 > > > > If I then make a beanplot(As1988), I find that the beanline (average) is now > 77.68561 while the beanline (median) is 35.39739 (using the locator function > to check the graph and log axis). > > > > While I can understand the small discrepancy of the median(mouse hovering > over the line), I am at a loss to explain the huge difference between the > means. Is this a flaw in the package or is there something I am missing? > > > > My gut feeling is that the log scales are affecting the calculations > somehow. > > > > Sincerely, > > Michael Hopgood > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Michael, Looking at the help for beanplot(), note that the 'log' option defaults to 'auto' which means the function will automatically log-transform data like yours. This also implies that the mean it shows is the geometric mean, not the arithmetic mean. As you note, the transformation doesn't affect the median. If you don't want this behavior, I think setting log="" will do the trick. Hope this helps. Tom Wainwright On 11/24/2009 03:52 AM, Michael Hopgood wrote:> Hi there, > > I have a set of data that looks like this: > > As1988<-c(1254.0, 22.0, 4.2, 1081.0, 35.0, 6.0, 1772.0, 192.0, 7.6) > > The mean of this (as calculated by R) is: 485.9778 > > The median of this (as calculated by R) is: 35 > > > > If I then make a beanplot(As1988), I find that the beanline (average) is now > 77.68561 while the beanline (median) is 35.39739 (using the locator function > to check the graph and log axis). > > > > While I can understand the small discrepancy of the median(mouse hovering > over the line), I am at a loss to explain the huge difference between the > means. Is this a flaw in the package or is there something I am missing? > > > > My gut feeling is that the log scales are affecting the calculations > somehow. > > > > Sincerely, > > Michael Hopgood > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Tom Wainwright NOAA Northwest Fisheries Science Center Newport, Oregon ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The contents of this message are mine personally and do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On 11/26/2009 02:25 AM, Michael Hopgood wrote:> Hi Tom, > > Thank you for the friendly and informative answer. It does explain a lot of > things, actually. As with any good answer, it inevitably leads to other > questions. In the first place, I need the arithmetic mean. It's what we > base our calculations on... > > My code is currently this: > > Metall<-c("Cu","Cu","Cu","Cu","Cu","Cu","Cu","Cu","Cu","Cr","Cr","Cr","Cr"," > Cr","Cr","Cr","Cr","Cr","As","As","As","As","As","As","As","As","As","Pb","P > b","Pb","Pb","Pb","Pb","Pb","Pb","Pb","Zn","Zn","Zn","Zn","Zn","Zn","Zn","Zn > ","Zn") > Halt<-c(85,13,13,340,18,13,88,24,12,216,33,21,454,20,18,88,30,21,1254,22,4.2 > ,1081,35,6,1772,192,7.6,43,20,12,3107,21,12,30,24,19,1109,57,46,269,68,50,58 > 5,131,52) > beanplot(Halt~Metall, log = "y", yaxt = "n", ylab="Halt > (mg/kg)",cex.lab=1.2) > axis(2,c(1,10,100,1000,10000)) > polygon(c(0.2966510,0.2966510,1.4832033,1.4832033,3.6160162,3.6160162,4.4921 > 444,4.4921444,5.6968371,5.6968371),c(2.763021e-01,10,10,80,80,40,40,250,250, > 2.763021e-01),col="#66FF0090", border="#66FF0090") > text(5.58,10,"<KM", cex=1.2, font=2) > polygon(c(0.2966510,0.29665101,1.4832033,1.4832033),c(10,25,25,10),col="#FFF > F0090",border="#FFFF0090") > > polygon(c(1.4832033,1.4832033,2.5027348,2.5027348,3.6160162,3.6160162,4.4921 > 444,4.4921444,5.6968371,5.6968371,4.4921444,4.4921444,3.6160162,3.6160162,1. > 4832033),c(80,150,150,200,200,400,400,500,500,250,250,40,40,80,80),col="#FFF > F0090",border="#FFFF0090") > text(5.54,350,"<MKM", cex=1.2, font=2) > polygon(c(0.2966510,0.2966510,5.6968371,5.6968371,4.4921444,4.4921444,3.6160 > 162,3.6160162,2.5027348,2.5027348,1.4832033,1.4832033),c(25,30085.997183,300 > 85.997183,500,500,400,400,200,200,150,150,25),col="#FF000090",border="#FF000 > 090") > text(5.54,2500,">MKM", cex=1.2, font=2) > > > The polygons convey information on whether each sample is higher than the > soil guideline value. If I take away, the log scale, the vast difference in > values obscures the polygons... Ideally I'd like the average beanline to be > the arithmetic mean or to be gone altogether. Can't seem to make beanplot do > this... > > Sincerely, > Michael Hopgood >Hi Michael, I don't know beanplot() well enough to know if it can be forced to do exactly what you want. You should be able to use the "what" argument to suppress the mean lines, then you could add your own average lines using lines() or segments(). You could of course also modify the code to suit your needs (mybean <- beanplot; edit mybean), but it looks pretty complicated. Tom Wainwright -- NOAA Northwest Fisheries Science Center Newport, Oregon ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The contents of this message are mine personally and do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~