I am willing to add these functions into the gregmisc package to complement
the already-existing 'running' function.
-Greg
> -----Original Message-----
> From: David Brahm [mailto:brahm@alum.mit.edu]
> Sent: Monday, September 16, 2002 11:10 AM
> To: r-devel@stat.math.ethz.ch
> Cc: maechler@stat.math.ethz.ch; Ray.Brownrigg@mcs.vuw.ac.nz
> Subject: Running Median and Mean
>
>
> R gurus,
>
> On Aug 20, 2002, I asked in R-help about calculating a
> running 5-day median on
> a large matrix. Thanks to Martin Maechler
> <maechler@stat.math.ethz.ch> and Ray
> Brownrigg <Ray.Brownrigg@mcs.vuw.ac.nz> for responding.
>
> I ended up writing C code (and an R interface) to do it,
> which is about 1000x
> faster than the naive method! (72s became .09s on a 223 x
> 520 matrix). I
> added a running mean function too, which is very simple and
> is about 15x
> faster than apply(..., filter, ...). I've packaged the code as
> <http://cran.r-project.org/src/contrib/Devel/runStat_1.1.tar.gz>
>
> The key insight was that it *is* in fact a good idea to
> calculate the ranks of
> the 5 data points, because then it is easy to calculate how
> those ranks
> *change* as you step through the column. For example,
> suppose the last few
> points in a column look like:
> data= 20 25 18 14 78 55 29
> rank= 3 2 1 5 4
> where I have ranked the 5 points prior to the last one. The
> median is 25 (the
> point with rank 3). Stepping to the left:
> data= 20 25 18 14 78 55 29
> rank= 3 4 2 1 5
> For the 4 points that overlap, their ranks remain the same,
> except they are
> incremented by one for points >=20, and decremented by one
> for points >55. The
> rank of the new point (20) is just 5 minus the number of
> increments done.
>
> The code is copied below for your amusement. Suggestions
> welcome. Note I have
> not dealt with NA's, even values of N, row-wise medians, or
> non-lagged medians
> (i.e. I put the median of points 1:5 into position 6, not
> position 5). It
> would be great if somebody (Martin?) wanted to incorporate
> these functions into
> an existing package or even "base"; otherwise I'll post the
> final product on
> CRAN/src/contrib.
> -- David Brahm (brahm@alum.mit.edu)
>
> ############### C Code: ##################
>
> #include <R.h>
>
> void runMedian(double *m, int *d, int *N) {
> int i, j, k, a, R=(*N+1)/2, *r=(int*) R_alloc(*N, sizeof(int));
> double old, new, *x;
>
> for (j=1; j <= d[1]; j++) { /* Loop
> over columns */
> x = m + j*d[0] - 1 - *N; /* x[*N]
> = m[nrow(m),j] */
>
> for (i=0; i < *N; i++) { /* Calculate
> initial ranks "r" */
> r[i] = 1; /* r
> higher if larger or later */
> for (k=0; k < i; k++) if (x[i] >= x[k]) r[i]++;
> for (k=i+1; k < *N; k++) if (x[i] > x[k]) r[i]++;
> if (r[i]==R) x[*N] = x[i]; /* If rank=R,
> this is the median */
> }
>
> for (i=d[0]-1; i > *N; i--) { /* Now
> x[*N] = m[i,j] */
> x--; old=x[*N]; new=x[0]; a=*N; /* a =
> rank of new guy */
> for (k=*N-1; k > 0; k--) { /* Recalculate each
> element's rank */
> r[k] = r[k-1]; /* Shift previous
> iteration's ranks */
> if (x[k] >= new) {r[k]++; a--;} /* Are we adding a
> lower number? */
> if (x[k] > old) {r[k]--;} /* Are we dropping a
> higher number? */
> if (r[k]==R) x[*N] = x[k]; /* If rank=R, this
> is the median */
> }
> r[0] = a;
> if (a==R) x[*N] = new; /* Is the new
> guy the median? */
> }
> }
> }
>
>
> void runMean(double *m, int *d, int *N) {
> int i, j, k;
> double new, *x;
>
> for (j=1; j <= d[1]; j++) { /* Loop
> over columns */
> x = m + j*d[0] - 1 - *N;
> for (i=d[0]; i > *N; i--) { /*
> x[*N] = m[i,j] */
> for (k=0, new=0; k < *N; k++) new += x[k];
> x[*N] = new / *N; x--;
> }
> }
> }
>
> ############### R Code: ##################
>
> runMedian <- function(m, N=5) {
> if (!(N %% 2)) stop("N should be odd.")
> d <- od <- dim(m)
> if (!(nd <- length(d))) d <- c(length(m), 1)
> # Vector -> matrix
> if (nd > 2) d <- c(d[1], prod(d[-1]))
> # Array -> matrix
> if (d[1] <= N) {m[] <- NA; return(m)}
> m <- array(.C("runMedian", m=as.real(m), as.integer(d),
> as.integer(N))$m, d)
> m[1:N, ] <- NA
> if (nd==2) m else if (!nd) m[ ,1] else array(m, od) #
> Matrix/Vector/Array
> }
>
> runMean <- function(m, N=21) {
> d <- od <- dim(m)
> if (!(nd <- length(d))) d <- c(length(m), 1)
> # Vector -> matrix
> if (nd > 2) d <- c(d[1], prod(d[-1]))
> # Array -> matrix
> if (d[1] <= N) {m[] <- NA; return(m)}
> m <- array(.C("runMean", m=as.real(m), as.integer(d),
> as.integer(N))$m, d)
> m[1:N, ] <- NA
> if (nd==2) m else if (!nd) m[ ,1] else array(m, od) #
> Matrix/Vector/Array
> }
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To:
> r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
>
LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be
privileged. It is intended for the addressee(s) only. Access to this E-mail by
anyone else is unauthorized. If you are not an addressee, any disclosure or
copying of the contents of this E-mail or any action taken (or not taken) in
reliance on it is unauthorized and may be unlawful. If you are not an addressee,
please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._