Richard Vlasimsky
2010-Nov-08 20:16 UTC
[R] A more efficient way to roll values in an irregular time series dataset?
Does anyone recommend a more efficient way to "roll" values in a time series dataset? I merged a bunch of different time series datasets (10's of thousands of them) whose observation dates and sampling interval differ. Some time series observations are reported at the beginning of the month, some at the end, some on Mondays, some on Wednesday, some annually, etc. In the process of merging all of the irregular time series (by date observed), a significant number of NA's appear in the dataset where I really want the last reported value 'rolled' forward. To use a concrete example, a time series that has reported values at the beginning of every month shows NA's for every day except the date it was reported (in this case, the first of the month). I want the value to roll forward so that NA's after the first of the month are replaced with a last reported value. I wrote the following for loop to accomplish the task on the object 'dataset', however it is far to slow too process 10's of thousands of different time series with 15,000 observations each. At this rate it is going, it would take weeks to complete. for(j in 1:length(names(dataset))) { last<-NA; for(i in 1:length(row.names(dataset))) ifelse(is.na(dataset[i,j]), test[i,j] <- last, last<-dataset[i,j]); } One would think a rather simple operation as this could perform much faster. My sense is using the "apply" function is the way to go, however I just can't get my head around a function that would reference the last reported value. Any guidance is appreciated. -Richard [[alternative HTML version deleted]]
Dennis Murphy
2010-Nov-08 20:23 UTC
[R] A more efficient way to roll values in an irregular time series dataset?
Hi: Look into the zoo package and its rollapply() function. The package is designed to handle irregular and multiple series. HTH, Dennis On Mon, Nov 8, 2010 at 12:16 PM, Richard Vlasimsky < richard.vlasimsky@imidex.com> wrote:> Does anyone recommend a more efficient way to "roll" values in a time > series dataset? > > I merged a bunch of different time series datasets (10's of thousands of > them) whose observation dates and sampling interval differ. Some time > series observations are reported at the beginning of the month, some at the > end, some on Mondays, some on Wednesday, some annually, etc. > > In the process of merging all of the irregular time series (by date > observed), a significant number of NA's appear in the dataset where I really > want the last reported value 'rolled' forward. > > To use a concrete example, a time series that has reported values at the > beginning of every month shows NA's for every day except the date it was > reported (in this case, the first of the month). I want the value to roll > forward so that NA's after the first of the month are replaced with a last > reported value. > > I wrote the following for loop to accomplish the task on the object > 'dataset', however it is far to slow too process 10's of thousands of > different time series with 15,000 observations each. At this rate it is > going, it would take weeks to complete. > > for(j in 1:length(names(dataset))) > { > last<-NA; > for(i in 1:length(row.names(dataset))) > ifelse(is.na(dataset[i,j]), test[i,j] <- last, > last<-dataset[i,j]); > > } > > One would think a rather simple operation as this could perform much > faster. My sense is using the "apply" function is the way to go, however I > just can't get my head around a function that would reference the last > reported value. > > Any guidance is appreciated. > > -Richard > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Gabor Grothendieck
2010-Nov-08 20:24 UTC
[R] A more efficient way to roll values in an irregular time series dataset?
On Mon, Nov 8, 2010 at 3:16 PM, Richard Vlasimsky <richard.vlasimsky at imidex.com> wrote:> Does anyone recommend a more efficient way to "roll" values in a time series dataset? > > I merged a bunch of different time series datasets (10's of thousands of them) whose observation dates and sampling interval differ. ?Some time series observations are reported at the beginning of the month, some at the end, some on Mondays, some on Wednesday, some annually, etc. > > In the process of merging all of the irregular time series (by date observed), a significant number of NA's appear in the dataset where I really want the last reported value 'rolled' ?forward. > > To use a concrete example, a time series that has reported values at the beginning of every month shows NA's for every day except the date it was reported (in this case, the first of the month). ?I want the value to roll forward so that NA's after the first of the month are replaced with a last reported value. > > I wrote the following for loop to accomplish the task on the object 'dataset', however it is far to slow too process 10's of thousands of different time series with 15,000 observations each. ?At this rate it is going, it would take weeks to complete. > > for(j in 1:length(names(dataset))) > { > ? ? ? ?last<-NA; > ? ? ? ?for(i in 1:length(row.names(dataset))) > ? ? ? ? ? ? ? ? ? ? ? ?ifelse(is.na(dataset[i,j]), test[i,j] <- last, last<-dataset[i,j]); > > } > > One would think a rather simple operation as this could perform much faster. ?My sense is using the "apply" function is the way to go, however I just can't get my head around a function that would reference the last reported value. > > Any guidance is appreciated. >Don't know if its fast enough for you but in zoo you can merge and carry the last occurrence forward like this: # suppose z1, z2, z3 are zoo series na.locf(merge(z1, z2, z3)) # as many as you like or L <- list(z1, z2, z3) na.locf(do.call("merge", L)) which produces a multivariate series, one per column with NAs filled in. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com