Rich Shepard
2024-Jan-22 15:45 UTC
[R] Use of geometric mean for geochemical concentrations
A statistical question, not specific to R. I'm asking for a pointer for a source of definitive descriptions of what types of data are best summarized by the arithmetic, geometric, and harmonic means. As an aquatic ecologist I see regulators apply the geometric mean to geochemical concentrations rather than using the arithmetic mean. I want to know whether the geometric mean of a set of chemical concentrations (e.g., in mg/L) is an appropriate representation of the expected value. If not, I want to explain this to non-technical decision-makers; if so, I want to understand why my assumption is wrong. TIA, Rich
Bert Gunter
2024-Jan-22 15:48 UTC
[R] Use of geometric mean for geochemical concentrations
better posted on r-sig-ecology? -- or maybe even stack exchange? Cheers, Bert On Mon, Jan 22, 2024 at 7:45?AM Rich Shepard <rshepard at appl-ecosys.com> wrote:> A statistical question, not specific to R. > > I'm asking for a pointer for a source of definitive descriptions of what > types of data are best summarized by the arithmetic, geometric, and > harmonic > means. > > As an aquatic ecologist I see regulators apply the geometric mean to > geochemical concentrations rather than using the arithmetic mean. I want to > know whether the geometric mean of a set of chemical concentrations (e.g., > in mg/L) is an appropriate representation of the expected value. If not, I > want to explain this to non-technical decision-makers; if so, I want to > understand why my assumption is wrong. > > TIA, > > Rich > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
>>>>> Rich Shepard >>>>> on Mon, 22 Jan 2024 07:45:31 -0800 (PST) writes:> A statistical question, not specific to R. I'm asking for > a pointer for a source of definitive descriptions of what > types of data are best summarized by the arithmetic, > geometric, and harmonic means. In spite of off-topic: I think it is a good question, not really only about geo-chemistry, but about statistics in applied sciences (and engineering for that matter). Something I sure good applied statisticians in the 1980's and 1990's would all know the answer of : To use the geometric mean instead of the arithmetic mean is basically *equivalent* to first log-transform the data and then work with that transformed data: Not just for computing average, but for more relevant modelling, inference, etc. John W Tukey (and several other of the grands of the time) had the log transform among the "First aid transformations": If the data for a continuous variable must all be positive it is also typically the case that the distribution is considerably skewed to the right. In such a case behave as a good human who sees another human in health distress: apply First Aid -- do the things you learned to do quickly without too much thought, because things must happen fast ---to hopefully save the other's life. Here: Do log transform all such variables with further ado, and only afterwards start your (exploratory and more) data analysis. Now, mean(log(y)) = log(geometricmean(y)), where mean() is the arithmetic mean as in R {mathematically; on the computer you need all.equal(), not '==' !!} I.e., according to Tukey and all the other experienced applied statisticians of the past, the geometric mean is the "best thing" to do for such positive right-skewed data in the same sense that the log-transform is the best "a priori" transformation for such data -- with the one advantage even that you need to fiddle with zeroes when log-transforming, whereas the geometric mean works already for zeroes. Martin > As an aquatic ecologist I see regulators apply the > geometric mean to geochemical concentrations rather than > using the arithmetic mean. I want to know whether the > geometric mean of a set of chemical concentrations (e.g., > in mg/L) is an appropriate representation of the expected > value. If not, I want to explain this to non-technical > decision-makers; if so, I want to understand why my > assumption is wrong. > TIA, > Rich > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and > more, see https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code.
Rich Shepard
2024-Jan-24 17:24 UTC
[R] Use of geometric mean for geochemical concentrations [RESOLVED]
On Mon, 22 Jan 2024, Rich Shepard wrote:> As an aquatic ecologist I see regulators apply the geometric mean to > geochemical concentrations rather than using the arithmetic mean. I want to > know whether the geometric mean of a set of chemical concentrations (e.g., > in mg/L) is an appropriate representation of the expected value. If not, I > want to explain this to non-technical decision-makers; if so, I want to > understand why my assumption is wrong.Many of you provided excellent comments, and so did a couple of folks on StackExchange. Rather than responding to individual posts I've waited until the thread petered out to provide an overall response. I've two points to make: one on mean calculations and the second on the context I didn't sufficiently provide when I posted my question. Responses confirmed that the appropriate model for calculating means depends on the data set and the question(s) the data are to answer. So the summary answer to my question (as stated) is: it depends. :-) Thank you. What prompted my thread-starting message is that I work in the realm of environmental regulation compliance, including the Clean Water Act and the Endangered Species Act. There is one state environmental regulator that provides state-wide point source storm water discharges under a General permit for smaller industrial activities. The permit monitoring requirements are 4 samples per year, one each quarter for a small set of water chemical and physical constituents (really!) and the reporting requirements are to use the geometric mean to summarize the four data points. I have my clients calculate an arithmetic mean in addition. (For the record, if you have an Agriculture Department General Storm Water Discharge Permit for a point source such as a livestock feed lot you need only a single sample (after the rains start) to comply with the permit. Feh! Germane to Bert's comments about all the wrong ways to treat non-detected/censored water chemical analyses, I discovered Dennis Helsel by his 2005 article in Environmental Science & Technology (Oct. 16th). Bought his book when it was published in 2012 and have used survival analyses on censored data ever since. (Also presented a Continuing Legal Education talk in 2016 with a nice thank-you email from a state district judge who attended.) I greatly appreciate all your comments and apologize for not better explaining the context of my question when I posted my first message. Regards, Rich