>>>>> Rich Shepard >>>>> on Mon, 22 Jan 2024 07:45:31 -0800 (PST) writes:> A statistical question, not specific to R. I'm asking for > a pointer for a source of definitive descriptions of what > types of data are best summarized by the arithmetic, > geometric, and harmonic means. In spite of off-topic: I think it is a good question, not really only about geo-chemistry, but about statistics in applied sciences (and engineering for that matter). Something I sure good applied statisticians in the 1980's and 1990's would all know the answer of : To use the geometric mean instead of the arithmetic mean is basically *equivalent* to first log-transform the data and then work with that transformed data: Not just for computing average, but for more relevant modelling, inference, etc. John W Tukey (and several other of the grands of the time) had the log transform among the "First aid transformations": If the data for a continuous variable must all be positive it is also typically the case that the distribution is considerably skewed to the right. In such a case behave as a good human who sees another human in health distress: apply First Aid -- do the things you learned to do quickly without too much thought, because things must happen fast ---to hopefully save the other's life. Here: Do log transform all such variables with further ado, and only afterwards start your (exploratory and more) data analysis. Now, mean(log(y)) = log(geometricmean(y)), where mean() is the arithmetic mean as in R {mathematically; on the computer you need all.equal(), not '==' !!} I.e., according to Tukey and all the other experienced applied statisticians of the past, the geometric mean is the "best thing" to do for such positive right-skewed data in the same sense that the log-transform is the best "a priori" transformation for such data -- with the one advantage even that you need to fiddle with zeroes when log-transforming, whereas the geometric mean works already for zeroes. Martin > As an aquatic ecologist I see regulators apply the > geometric mean to geochemical concentrations rather than > using the arithmetic mean. I want to know whether the > geometric mean of a set of chemical concentrations (e.g., > in mg/L) is an appropriate representation of the expected > value. If not, I want to explain this to non-technical > decision-makers; if so, I want to understand why my > assumption is wrong. > TIA, > Rich > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and > more, see https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code.
On Mon, 22 Jan 2024, Martin Maechler wrote:> I think it is a good question, not really only about geo-chemistry, but > about statistics in applied sciences (and engineering for that matter).> John W Tukey (and several other of the grands of the time) had the log > transform among the "First aid transformations": > > If the data for a continuous variable must all be positive it is also > typically the case that the distribution is considerably skewed to the > right. In such a case behave as a good human who sees another human in > health distress: apply First Aid -- do the things you learned to do > quickly without too much thought, because things must happen fast ---to > hopefully save the other's life.Martin, Thanks very much. I will look further into this because toxic metals and organic compounds in geochemical collections almost always have censored lab results (below method dection limits) that range from about 15% to 80% or more, and there almost always are very high extreme values. I'll learn to understand what benefits log transforms have over compositional data analyses. Best regards, Rich
Dear Martin, Helpful general advice, although it's perhaps worth mentioning that the geometric mean, defined e.g. naively as prod(x)^(1/length(x)), is necessarily 0 if there are any 0 values in x. That is, the geometric mean "works" in this case but isn't really informative. Best, John -- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://www.john-fox.ca/ On 2024-01-22 12:18 p.m., Martin Maechler wrote:> Caution: External email. > > >>>>>> Rich Shepard >>>>>> on Mon, 22 Jan 2024 07:45:31 -0800 (PST) writes: > > > A statistical question, not specific to R. I'm asking for > > a pointer for a source of definitive descriptions of what > > types of data are best summarized by the arithmetic, > > geometric, and harmonic means. > > In spite of off-topic: > > I think it is a good question, not really only about > geo-chemistry, but about statistics in applied sciences (and > engineering for that matter). > > Something I sure good applied statisticians in the 1980's and > 1990's would all know the answer of : > > To use the geometric mean instead of the arithmetic mean > is basically *equivalent* to first log-transform the data > and then work with that transformed data: > Not just for computing average, but for more relevant modelling, > inference, etc. > > John W Tukey (and several other of the grands of the time) > had the log transform among the "First aid transformations": > > If the data for a continuous variable must all be positive it is > also typically the case that the distribution is considerably > skewed to the right. > In such a case behave as a good human who sees another human in > health distress: apply First Aid -- do the things you learned to > do quickly without too much thought, because things must happen > fast ---to hopefully save the other's life. > > Here: Do log transform all such variables with further ado, > and only afterwards start your (exploratory and more) data analysis. > > Now, mean(log(y)) = log(geometricmean(y)), > where mean() is the arithmetic mean as in R > {mathematically; on the computer you need all.equal(), not '==' !!} > > I.e., according to Tukey and all the other experienced applied > statisticians of the past, the geometric mean is the "best thing" > to do for such positive right-skewed data in the same sense > that the log-transform is the best "a priori" transformation for > such data -- with the one advantage even that you need to fiddle > with zeroes when log-transforming, whereas the geometric mean > works already for zeroes. > > Martin > > > > As an aquatic ecologist I see regulators apply the > > geometric mean to geochemical concentrations rather than > > using the arithmetic mean. I want to know whether the > > geometric mean of a set of chemical concentrations (e.g., > > in mg/L) is an appropriate representation of the expected > > value. If not, I want to explain this to non-technical > > decision-makers; if so, I want to understand why my > > assumption is wrong. > > > TIA, > > > Rich > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and > > more, see https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide > > commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.