I am curious to know what algorithm R's mean function uses. Is there some reference to the numerical properties of this algorithm? I found the following C code in summary.c:do_summary(): case REALSXP: PROTECT(ans = allocVector(REALSXP, 1)); for (i = 0; i < n; i++) s += REAL(x)[i]; s /= n; if(R_FINITE((double)s)) { for (i = 0; i < n; i++) t += (REAL(x)[i] - s); s += t/n; } REAL(ans)[0] = s; break; It seems to do a straight up mean: for (i = 0; i < n; i++) s += REAL(x)[i]; s /= n; Then it adds what i assume is a numerical correction which seems to be the mean difference from the mean of the data: for (i = 0; i < n; i++) t += (REAL(x)[i] - s); s += t/n; I haven't been able to track this algorithm down anywhere (mean is not a great search term). Any help would be much appreciated, Zach Harrington
This was also asked on StackOverflow: http://stackoverflow.com/q/17866149/271616. Here is the answer I posted: This appears to be the updating method of West, 1979 [1] and it was implemented in R-2.3.0 in response to PR#1228 [2]. I'm not positive this is the correct algorithm, since it was suggested by Martin Maechler but implemented by Brian Ripley. I couldn't find a reference in the source code or version control logs that listed the actual algorithm used. It was implemented in cov.c in revision 37389 and in summary.c in revision 37393. [1] http://dl.acm.org/citation.cfm?doid=359146.359153 [2] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=1228 Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Thu, Jul 25, 2013 at 2:44 PM, Zach Harrington <zach.harrington at gmail.com> wrote:> I am curious to know what algorithm R's mean function uses. Is there some > reference to the numerical properties of this algorithm? > > I found the following C code in summary.c:do_summary(): > case REALSXP: > PROTECT(ans = allocVector(REALSXP, 1)); > for (i = 0; i < n; i++) s += REAL(x)[i]; > s /= n; > if(R_FINITE((double)s)) { > for (i = 0; i < n; i++) t += (REAL(x)[i] - s); > s += t/n; > } > REAL(ans)[0] = s; > break; > > It seems to do a straight up mean: > for (i = 0; i < n; i++) s += REAL(x)[i]; > s /= n; > > Then it adds what i assume is a numerical correction which seems to be the > mean difference from the mean of the data: > for (i = 0; i < n; i++) t += (REAL(x)[i] - s); > s += t/n; > > I haven't been able to track this algorithm down anywhere (mean is not a > great search term). > > Any help would be much appreciated, > > Zach Harrington > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Prof J C Nash (U30A)
2013-Jul-27 14:35 UTC
[Rd] What algorithm is R using to calculate mean?
There is quite a literature on related methods for variance. If anyone is interested, I did some work (and even published the code in the magazine Interface Age in 1981) on some of these. I could probably put together scans of relevant materials, some of which are not easily available. It would make a good Master's or senior undergrad project, and could be (possibly has been) extended to compute covariance matrices properly for very large data sets and streaming data. JN On 13-07-27 06:00 AM, r-devel-request at r-project.org wrote:> Message: 6 > Date: Fri, 26 Jul 2013 12:58:28 +0000 > From: Ravi Varadhan<ravi.varadhan at jhu.edu> > To: Joshua Ulrich<josh.m.ulrich at gmail.com>, Zach Harrington > <zach.harrington at gmail.com> > Cc:"r-devel at r-project.org List" <r-devel at r-project.org> > Subject: Re: [Rd] What algorithm is R using to calculate mean? > Message-ID: > <2F9EA67EF9AE1C48A147CB41BE2E15C3491530 at DOM-EB-MAIL1.win.ad.jhu.edu> > Content-Type: text/plain; charset="us-ascii" > > This uses the idea of Kahan's summation, if I am not mistaken. > > http://en.wikipedia.org/wiki/Kahan_summation_algorithm > > Ravi > __________________________