Vadim Ogranovich
2005-Apr-02 01:07 UTC
[R] Survey of "moving window" statistical functions - still looking f or fast mad function
Hi, First, let me thank Jaroslaw for making this survey. I find it quite illuminating. Now the questions: * the #1 solution below (based on cumsum) is numerically unstable. Specifically if you do the runmean on a positive vector you can easily get negative numbers due to rounding errors. Does anyone see a modification which is free of this deficiency? * is it possible to optimize the algorithm of the filter function, solution #2 below, for the case of the rep(1/k,k) kernel? Thanks, Vadim [R] Survey of "moving window" statistical functions - still looking f or fast mad function * This message: [ Message body <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start> ] [ More options <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#options2> ] * Related messages: [ Next message <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html> ] [ Previous message <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html> ] [ Next in thread <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> ] [ Replies <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#replies> ] From: Tuszynski, Jaroslaw W. <JAROSLAW.W.TUSZYNSKI_at_saic.com <mailto:JAROSLAW.W.TUSZYNSKI_at_saic.com?Subject=Re:%20%5BR%5D%20Survey% 20of%20"moving%20window"%20statistical%20functions%20-%20still %20lookingf%20or%20fast%20mad%20function> > Date: Sat 09 Oct 2004 - 06:30:32 EST Hi, Lately I run into a problem that my code R code is spending hours performing simple moving window statistical operations. As a result I did searched archives for alternative (faster) ways of performing: mean, max, median and mad operation over moving window (size 81) on a vector with about 30K points. And performed some timing for several ways that were suggested, and few ways I come up with. The purpose of this email is to share some of my findings and ask for more suggestions (especially about moving mad function). Sum over moving window can be done using many different ways. Here are some sorted from the fastest to the slowest: 1. runmean = function(x, k) { n = length(x) y = x[ k:n ] - x[ c(1,1:(n-k)) ] # this is a difference from the previous cell y[1] sum(x[1:k]); # find the first sum y = cumsum(y) # apply precomputed differences return(y/k) # return mean not sum } 2. filter(x, rep(1/k,k), sides=2, circular=T) - (stats package) 3. kernapply(x, kernel("daniell", m), circular=T) 4. apply(embed(x,k), 1, mean) 5. mywinfun <- function(x, k, FUN=mean, ...) { # suggested in news group n <- length(x) A <- rep(x, length=k*(n+1)) dim(A) <- c(n+1, k) sapply(split(A, row(A)), FUN, ...)[1:(n-k+1)] } 6. rollFun(x, k, FUN=mean) - (fSeries package) 7. rollMean(x, k) - (fSeries package) 8. SimpleMeanLoop = function(x, k) { n = length(x) # simple-minded loop used as a baseline y = rep(0, n) k = k%/%2; for (i in (1+k):(n-k)) y[i] = mean(x[(i-k):(i+k)]) } 9. running(x, fun=mean, width=k) - (gtools package) Some of above functions return results that are the same length as x and some return arrays with length n-k+1. The relative speeds (on Windows machine) were as follow: 0.01, 0.09, 1.2, 8.1, 11.2, 13.4, 27.3, 63, 345. As one can see there are about 5 orders of magnitude between the fastest and the slowest. Maximum over moving window can be done as follow, in order of speed 1. runmax = function(x, k) { n = length(x) y = rep(0, n) m = k%/%2; a = 0; for (i in (1+m):(n-m)) { if (a==y[i-1]) y[i] max(x[(i-m):(i+m)]) # calculate max of the window else y[i] max(y[i-1], x[i+m]); # max of the window is =y[i-1] a = x[i-m] # point that will be removed from the window } return(y) } 2. apply(embed(x,k), 1, max) 3. SimpleMaxLoop(x, k) - similar to SimpleMeanLoop above 4. mywinfun(x, k, FUN=max) - see above 5. rollFun(x, k, FUN=max) - fSeries package 6. rollMax(x, k) - fSeries package 7. running(x, fun=max, width=k) - gtools package The relative speeds were: <0.01, 3, 3.4, 5.3, 7.5, 7.7, 15.3 Median over moving window can be done as follows: 1. runmed(x, k) - from stats package 2. SimpleMedLoop(x, k) - similar to SimpleMeanLoop above 3. apply(embed(x,k), 1, median) 4. mywinfun(x, k, FUN=median) - see above 5. rollFun (x, k, FUN=median) - fSeries package 6. running(x, fun=max, width=k) - gtools package Speeds: <0.01, 3.4, 9, 15, 29, 165 Mad over moving window can be done as follows: 1. runmad = function(x, k) { n = length(x) A = embed(x,k) A = abs(A - rep(apply(A, 1, median), k)) dim(A) = c(n-k+1, k) apply(A, 1, median) } 2. apply(embed(x,k), 1, mad) 3. mywinfun(x, k, FUN=mad) - see above 4. SimpleMadLoop(x, k) - similar to SimpleMeanLoop above 5. rollFun(x, k, FUN=mad) - fSeries package 6. running(x, fun=mad, width=k) - gtools package Speeds: 11, 18, 25, 50, 50, 400 Some thoughts about those results: * All functions from Stats package (runmed, filter, kernapply) are fast and hard to improve on. * In case of Mean and Max a simple un-optimized R codes are much faster than specialized functions build for the same purpose. * apply(embed(x,k), 1, fun) - seem to be always faster than any functions from fSeries package or "mywinfun" * "running" function from gtools package is horribly slow compared to anything else * "mywinfun" proposed as a faster version of "apply(embed(x,k), 1, fun)" seems to be always slower Finally a question: I still need to get moving windows mad function faster my "runmad" function is not that much faster than apply/embed combo, and that I used before, and this is where my code spends most of its time. I need something like "runmed" but for a mad function. Any suggestions? Jarek =====================================\==== Jarek Tuszynski, PhD. o / \ Science Applications International Corporation <\__,| (703) 676-4192 "> \ Jaroslaw.W.Tuszynski@saic.com ` \ [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Oct 09 06:43:05 2004 * This message: [ Message body <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start> ] * Next message: Emili Tortosa-Ausina: "Re: [R] RWinEdt" <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html> * Previous message: Brian S Cade: "Re: [R] reading Systat into R" <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html> * Next in thread: Prof Brian Ripley: "Re: [R] Survey of "moving window" statistical functions - still looking f or fast mad function" <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> * Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window" statistical functions - still looking f or fast mad function" <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> * Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window" statistical functions - still looking f or fast mad function" <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> * Contemporary messages sorted: [ By Date <http://tolstoy.newcastle.edu.au/R/help/04/10/date.html#5161> ] [ By Thread <http://tolstoy.newcastle.edu.au/R/help/04/10/index.html#5161> ] [ By Subject <http://tolstoy.newcastle.edu.au/R/help/04/10/subject.html#5161> ] [ By Author <http://tolstoy.newcastle.edu.au/R/help/04/10/author.html#5161> ] [ By messages with attachments <http://tolstoy.newcastle.edu.au/R/help/04/10/attachment.html> ] This archive was generated by hypermail 2.1.8 <http://www.hypermail.org/> : Fri 18 Mar 2005 - 03:03:35 EST [[alternative HTML version deleted]]
Gabor Grothendieck
2005-Apr-02 01:40 UTC
[R] Survey of "moving window" statistical functions - still looking f or fast mad function
Jaroslaw's article was great. In fact, it was used as the basis for rapply and some optimized special cases that will be included in the R 2.1.0 version of zoo (which have been coded but not yet released). Regarding numerically stable summation, check out the idea behind the following which I coincidentally am also considering for the zoo implementation: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/393090 On Apr 1, 2005 8:07 PM, Vadim Ogranovich <vograno at evafunds.com> wrote:> Hi, > > First, let me thank Jaroslaw for making this survey. I find it quite > illuminating. > > Now the questions: > > * the #1 solution below (based on cumsum) is numerically unstable. > Specifically if you do the runmean on a positive vector you can easily > get negative numbers due to rounding errors. Does anyone see a > modification which is free of this deficiency? > * is it possible to optimize the algorithm of the filter function, > solution #2 below, for the case of the rep(1/k,k) kernel? > > Thanks, > Vadim > > [R] Survey of "moving window" statistical functions - still looking f or > fast mad function > > * This message: [ Message body > <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start> ] [ More > options > <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#options2> ] > * Related messages: [ Next message > <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html> ] [ Previous > message <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html> ] [ > Next in thread <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> > ] [ Replies > <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#replies> ] > > From: Tuszynski, Jaroslaw W. <JAROSLAW.W.TUSZYNSKI_at_saic.com > <mailto:JAROSLAW.W.TUSZYNSKI_at_saic.com?Subject=Re:%20%5BR%5D%20Survey% > 20of%20"moving%20window"%20statistical%20functions%20-%20still > %20lookingf%20or%20fast%20mad%20function> > > Date: Sat 09 Oct 2004 - 06:30:32 EST > > Hi, > > Lately I run into a problem that my code R code is spending hours > performing simple moving window statistical operations. As a result I > did searched archives for alternative (faster) ways of performing: mean, > max, median and mad operation over moving window (size 81) on a vector > with about 30K points. And performed some timing for several ways that > were suggested, and few ways I come up with. The purpose of this email > is to share some of my findings and ask for more suggestions (especially > about moving mad function). > > Sum over moving window can be done using many different ways. Here are > some sorted from the fastest to the slowest: > > 1. runmean = function(x, k) { n = length(x) y = x[ k:n ] - x[ > c(1,1:(n-k)) ] # this is a difference from the previous cell y[1] > sum(x[1:k]); # find the first sum y = cumsum(y) # apply precomputed > differences return(y/k) # return mean not sum } > 2. filter(x, rep(1/k,k), sides=2, circular=T) - (stats package) > 3. kernapply(x, kernel("daniell", m), circular=T) > 4. apply(embed(x,k), 1, mean) > 5. mywinfun <- function(x, k, FUN=mean, ...) { # suggested in news > group n <- length(x) A <- rep(x, length=k*(n+1)) dim(A) <- c(n+1, k) > sapply(split(A, row(A)), FUN, ...)[1:(n-k+1)] } > 6. rollFun(x, k, FUN=mean) - (fSeries package) > 7. rollMean(x, k) - (fSeries package) > 8. SimpleMeanLoop = function(x, k) { n = length(x) # simple-minded > loop used as a baseline y = rep(0, n) k = k%/%2; for (i in (1+k):(n-k)) > y[i] = mean(x[(i-k):(i+k)]) } > 9. running(x, fun=mean, width=k) - (gtools package) > > Some of above functions return results that are the same length as x and > some return arrays with length n-k+1. The relative speeds (on Windows > machine) were as follow: 0.01, 0.09, 1.2, 8.1, 11.2, 13.4, 27.3, 63, > 345. As one can see there are about 5 orders of magnitude between the > fastest and the slowest. > > Maximum over moving window can be done as follow, in order of speed > > 1. runmax = function(x, k) { n = length(x) y = rep(0, n) m = k%/%2; > a = 0; for (i in (1+m):(n-m)) { if (a==y[i-1]) y[i] > max(x[(i-m):(i+m)]) # calculate max of the window else y[i] > max(y[i-1], x[i+m]); # max of the window is =y[i-1] a = x[i-m] # point > that will be removed from the window } return(y) } > 2. apply(embed(x,k), 1, max) > 3. SimpleMaxLoop(x, k) - similar to SimpleMeanLoop above > 4. mywinfun(x, k, FUN=max) - see above > 5. rollFun(x, k, FUN=max) - fSeries package > 6. rollMax(x, k) - fSeries package > 7. running(x, fun=max, width=k) - gtools package The relative > speeds were: <0.01, 3, 3.4, 5.3, 7.5, 7.7, 15.3 > > Median over moving window can be done as follows: > > 1. runmed(x, k) - from stats package > 2. SimpleMedLoop(x, k) - similar to SimpleMeanLoop above > 3. apply(embed(x,k), 1, median) > 4. mywinfun(x, k, FUN=median) - see above > 5. rollFun (x, k, FUN=median) - fSeries package > 6. running(x, fun=max, width=k) - gtools package Speeds: <0.01, > 3.4, 9, 15, 29, 165 > > Mad over moving window can be done as follows: > > 1. runmad = function(x, k) { n = length(x) A = embed(x,k) A = abs(A > - rep(apply(A, 1, median), k)) dim(A) = c(n-k+1, k) apply(A, 1, median) > } > 2. apply(embed(x,k), 1, mad) > 3. mywinfun(x, k, FUN=mad) - see above > 4. SimpleMadLoop(x, k) - similar to SimpleMeanLoop above > 5. rollFun(x, k, FUN=mad) - fSeries package > 6. running(x, fun=mad, width=k) - gtools package Speeds: 11, 18, > 25, 50, 50, 400 > > Some thoughts about those results: > > * All functions from Stats package (runmed, filter, kernapply) are > fast and hard to improve on. > * In case of Mean and Max a simple un-optimized R codes are much > faster than specialized functions build for the same purpose. > * apply(embed(x,k), 1, fun) - seem to be always faster than any > functions from fSeries package or "mywinfun" > * "running" function from gtools package is horribly slow compared > to anything else > * "mywinfun" proposed as a faster version of "apply(embed(x,k), 1, > fun)" seems to be always slower > > Finally a question: I still need to get moving windows mad > function faster my "runmad" function is not that much faster than > apply/embed combo, and that I used before, and this is where my code > spends most of its time. I need something like "runmed" but for a mad > function. Any suggestions? > > Jarek > > =====================================\===> Jarek Tuszynski, PhD. o / \ > Science Applications International Corporation <\__,| > (703) 676-4192 "> \ > Jaroslaw.W.Tuszynski at saic.com ` \ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting > guide! http://www.R-project.org/posting-guide.html Received on Sat Oct > 09 06:43:05 2004 > > * This message: [ Message body > <http://tolstoy.newcastle.edu.au/R/help/04/10/5161.html#start> ] > * Next message: Emili Tortosa-Ausina: "Re: [R] RWinEdt" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5162.html> > * Previous message: Brian S Cade: "Re: [R] reading Systat into R" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5160.html> > * Next in thread: Prof Brian Ripley: "Re: [R] Survey of "moving > window" statistical functions - still looking f or fast mad function" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> > * Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window" > statistical functions - still looking f or fast mad function" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> > * Reply: Prof Brian Ripley: "Re: [R] Survey of "moving window" > statistical functions - still looking f or fast mad function" > <http://tolstoy.newcastle.edu.au/R/help/04/10/5167.html> > > * Contemporary messages sorted: [ By Date > <http://tolstoy.newcastle.edu.au/R/help/04/10/date.html#5161> ] [ By > Thread <http://tolstoy.newcastle.edu.au/R/help/04/10/index.html#5161> ] > [ By Subject > <http://tolstoy.newcastle.edu.au/R/help/04/10/subject.html#5161> ] [ By > Author <http://tolstoy.newcastle.edu.au/R/help/04/10/author.html#5161> > ] [ By messages with attachments > <http://tolstoy.newcastle.edu.au/R/help/04/10/attachment.html> ] > > This archive was generated by hypermail 2.1.8 > <http://www.hypermail.org/> : Fri 18 Mar 2005 - 03:03:35 EST > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Seemingly Similar Threads
- Survey of "moving window" statistical functions - still looking f or fast mad function
- How to compute Rolling analysis of Standard Deviation using ZOO package?
- passing character vectors to FORTRAN
- Help with gam
- apply a function to a rolling subset of a vector