Ralf B
2010-May-11 07:17 UTC
[R] Smoothing Techniques - short stepwise functions with spikes
R Friends, I have data from which I would like to learn a more general (smoothened) trend by applying data smoothing methods. Data points follow a positive stepwise function. | x x | xxxxxxxx xxxxxxxx | x x |xxxx xxx xxxx | xxxxxxxxxxxxxxxxx | | xxxxxxx xxxx |__________________________________________________________ Data points from each step should not be interacting with any other step. The outliers I want to to remove are spikes as shown in the diagram. These spikes do not have more than one or two points. I consider larger groups as relevant and want to keep them in. I sometimes have less than 5 points for each step, and up to 50 at max. Given these conditions would you suggest using one of the moving averages (e.g. SMA, EMA, DEMA, ...) or the locally linear regression (lowress) method. Are there any other options? Does anybody know a good site that overviews all methods without going to much into mathematical details but rather focusing on the requirements and underlying assumptions of each method? Is there perhaps even a package that runs and visualizes a comparison on the data similar to packages like 'party' ? (with 1000s of active packages, one can always hope for that) Thanks in advance! Ralf
Tal Galili
2010-May-11 10:19 UTC
[R] Smoothing Techniques - short stepwise functions with spikes
Hi Ralf, I can't offer you many resources, but the few I came across are: 1) loess (or the older version: lowess) 2) smooth 3) rollapply (from the zoo pacakge) I used a combination of 1 and 3 when creating an R implementaion for a (simplistic) quantile loess, you might find the code useful: http://www.r-statistics.com/2010/04/quantile-loess-combining-a-moving-quantile-window-with-loess-r-function/ Best, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Tue, May 11, 2010 at 10:17 AM, Ralf B <ralf.bierig@gmail.com> wrote:> R Friends, > > I have data from which I would like to learn a more general > (smoothened) trend by applying data smoothing methods. Data points > follow a positive stepwise function. > > > | x > x > | xxxxxxxx xxxxxxxx > | x x > |xxxx xxx xxxx > | xxxxxxxxxxxxxxxxx > | > | > xxxxxxx xxxx > |__________________________________________________________ > > > Data points from each step should not be interacting with any other > step. The outliers I want to to remove are spikes as shown in the > diagram. These spikes do not have more than one or two points. I > consider larger groups as relevant and want to keep them in. I > sometimes have less than 5 points for each step, and up to 50 at max. > Given these conditions would you suggest using one of the moving > averages (e.g. SMA, EMA, DEMA, ...) or the locally linear regression > (lowress) method. Are there any other options? Does anybody know a > good site that overviews all methods without going to much into > mathematical details but rather focusing on the requirements and > underlying assumptions of each method? Is there perhaps even a package > that runs and visualizes a comparison on the data similar to packages > like 'party' ? (with 1000s of active packages, one can always hope for > that) > > Thanks in advance! > Ralf > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Gabor Grothendieck
2010-May-11 10:46 UTC
[R] Smoothing Techniques - short stepwise functions with spikes
This removes runs of length 1 and 2. It replaces the values in any such run with NA and then uses na.locf from the zoo package to fill those NA's by carrying forward the last occurrence of a non-NA. In this example the run consisting of a single 2, the run consisting of two 3's and the run consisting of a single 4 are removed:> library(zoo) # na.locf > x <- rep(c(1,2,1,3,1,4,3), c(4,1,5,2,6,1,5)); x[1] 1 1 1 1 2 1 1 1 1 1 3 3 1 1 1 1 1 1 4 3 3 3 3 3> r <- rle(x) > r$values<- na.locf(ifelse(r$lengths <= 2, NA, r$values)) > inverse.rle(r)[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 On Tue, May 11, 2010 at 3:17 AM, Ralf B <ralf.bierig at gmail.com> wrote:> R Friends, > > I have data from which I would like to learn a more general > (smoothened) trend by applying data smoothing methods. Data points > follow a positive stepwise function. > > > | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?x > ? ? ? ? ? ? ? ? ? ? x > | ? ? ? ? ? ? ? ? ? ? ?xxxxxxxx xxxxxxxx > | ? ? ? x ? ?x > |xxxx xxx xxxx > | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? xxxxxxxxxxxxxxxxx > | > | > ? ? ? ? ?xxxxxxx xxxx > |__________________________________________________________ > > > Data points from each step should not be interacting with any other > step. The outliers I want to to remove are spikes as shown in the > diagram. These spikes do not have more than one or two points. I > consider larger groups as relevant and want to keep them in. I > sometimes have less than 5 points for each step, and up to 50 at max. > Given these conditions would you suggest using one of the moving > averages (e.g. SMA, EMA, DEMA, ...) or the locally linear regression > (lowress) method. Are there any other options? Does anybody know a > good site that overviews all methods without going to much into > mathematical details but rather focusing on the requirements and > underlying assumptions of each method? Is there perhaps even a package > that runs and visualizes a comparison on the data similar to packages > like 'party' ? (with 1000s of active packages, one can always hope for > that) > > Thanks in advance! > Ralf > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Liaw, Andy
2010-May-11 12:10 UTC
[R] Smoothing Techniques - short stepwise functions with spikes
I'm surprised no one except Ralf mentioned tree-based methods. Basic regression trees are fitting exactly the type of functions (piecewise constant) that Ralf is asking about. So, either tree() or rpart() or whatever is in party should fit the bill. Another possibility is wavelets with the Haar basis. (These will all preserve the piecewise constant nature of the problem, while general smoothing procedures such as local regression and splines assume there are no jumps in the underlying smooth function.) Andy> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Ralf B > Sent: Tuesday, May 11, 2010 3:17 AM > To: r-help at r-project.org > Subject: [R] Smoothing Techniques - short stepwise functions > with spikes > > R Friends, > > I have data from which I would like to learn a more general > (smoothened) trend by applying data smoothing methods. Data > points follow a positive stepwise function. > > > | x > x > | xxxxxxxx xxxxxxxx > | x x > |xxxx xxx xxxx > | xxxxxxxxxxxxxxxxx > | > | > xxxxxxx xxxx > |__________________________________________________________ > > > Data points from each step should not be interacting with any > other step. The outliers I want to to remove are spikes as > shown in the diagram. These spikes do not have more than one > or two points. I consider larger groups as relevant and want > to keep them in. I sometimes have less than 5 points for each > step, and up to 50 at max. > Given these conditions would you suggest using one of the > moving averages (e.g. SMA, EMA, DEMA, ...) or the locally > linear regression > (lowress) method. Are there any other options? Does anybody > know a good site that overviews all methods without going to > much into mathematical details but rather focusing on the > requirements and underlying assumptions of each method? Is > there perhaps even a package that runs and visualizes a > comparison on the data similar to packages like 'party' ? > (with 1000s of active packages, one can always hope for > that) > > Thanks in advance! > Ralf > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:11}}