Sam Albers
2012-Mar-19 20:10 UTC
[R] Lag based on Date objects with non-consecutive values
Hello all, I need to figure out a way to lag a variable in by a number of days without using the zoo package. I need to use a remote R connection that doesn't have the zoo package installed and is unwilling to do so. So that is, I want a function where I can specify the number of days to lag a variable against a Date formatted column. That is relatively easy to do. The problem arises when I don't have consecutive dates. I can't seem to figure out a way to insert an NA when there is non-consecutive date. So for example: ## A dataframe with non-consecutive dates set.seed(32) df1<-data.frame( Date=seq(as.Date("1967-06-05","%Y-%m-%d"),by="day", length=5), Dis1=rnorm(5, 1,10) ) df2<-data.frame( Date=seq(as.Date("1967-07-05","%Y-%m-%d"),by="day", length=10), Dis1=rnorm(5, 1,10) ) df <- rbind(df1,df2); df ## A function to lag the variable by a specified number of days lag.day <- function (lag.by, data) { c(rep(NA,lag.by), head(data$Dis1, -lag.by)) } ## Using the function df$lag1 <- lag.day(lag.by=1, data=df); df ## returns this data frame Date Dis1 lag1 1 1967-06-05 1.146405 NA 2 1967-06-06 9.732887 1.146405 3 1967-06-07 -9.279462 9.732887 4 1967-06-08 7.856646 -9.279462 5 1967-06-09 5.494370 7.856646 6 1967-06-15 5.070176 5.494370 7 1967-06-16 3.847314 5.070176 8 1967-06-17 -5.243094 3.847314 9 1967-06-18 9.396560 -5.243094 10 1967-06-19 4.112792 9.396560 ## When really what I would like is something like this: Date Dis1 lag1 1 1967-06-05 1.146405 NA 2 1967-06-06 9.732887 1.146405 3 1967-06-07 -9.279462 9.732887 4 1967-06-08 7.856646 -9.279462 5 1967-06-09 5.494370 7.856646 6 1967-06-15 5.070176 NA 7 1967-06-16 3.847314 5.070176 8 1967-06-17 -5.243094 3.847314 9 1967-06-18 9.396560 -5.243094 10 1967-06-19 4.112792 9.396560 So can anyone recommend a way (either using my function or any other approaches) that I might be able to consistently lag values based on a lag.by value and consecutive dates? Thanks so much in advance! Sam
Sam Albers
2012-Mar-20 00:03 UTC
[R] Lag based on Date objects with non-consecutive values
Hello R-ers, I just wanted to update this post. I've made some progress on this but am still not quite where I need to be. I feel like I am close so I just wanted to share my work so far. Thanks in advance! Sam On Mon, Mar 19, 2012 at 1:10 PM, Sam Albers <tonightsthenight at gmail.com> wrote:> Hello all, > > I need to figure out a way to lag a variable in by a number of days > without using the zoo package. I need to use a remote R connection > that doesn't have the zoo package installed and is unwilling to do so. > So that is, I want a function where I can specify the number of days > to lag a variable against a Date formatted column. That is relatively > easy to do. The problem arises when I don't have consecutive dates. I > can't seem to figure out a way to insert an NA when there is > non-consecutive date. So for example: > > > ## A dataframe with non-consecutive dates > set.seed(32) > df1<-data.frame( > ? ? ? ? ? Date=seq(as.Date("1967-06-05","%Y-%m-%d"),by="day", length=5), > ? ? ? ? ? Dis1=rnorm(5, 1,10) > ? ? ? ? ? ) > df2<-data.frame( > ?Date=seq(as.Date("1967-07-05","%Y-%m-%d"),by="day", length=10), > ?Dis1=rnorm(5, 1,10) > ?) > > df <- rbind(df1,df2); df > > ## A function to lag the variable by a specified number of days > lag.day <- function (lag.by, data) { > ?c(rep(NA,lag.by), head(data$Dis1, -lag.by)) > } > > ## Using the function > df$lag1 <- lag.day(lag.by=1, data=df); df > ## returns this data frame > > ? ? ? ? Date ? ? ?Dis1 ? ? ?lag1 > 1 ?1967-06-05 ?1.146405 ? ? ? ?NA > 2 ?1967-06-06 ?9.732887 ?1.146405 > 3 ?1967-06-07 -9.279462 ?9.732887 > 4 ?1967-06-08 ?7.856646 -9.279462 > 5 ?1967-06-09 ?5.494370 ?7.856646 > 6 ?1967-06-15 ?5.070176 ?5.494370 > 7 ?1967-06-16 ?3.847314 ?5.070176 > 8 ?1967-06-17 -5.243094 ?3.847314 > 9 ?1967-06-18 ?9.396560 -5.243094 > 10 1967-06-19 ?4.112792 ?9.396560 > > > ## When really what I would like is something like this: > > ? ? ? ? Date ? ? ?Dis1 ? ? ?lag1 > 1 ?1967-06-05 ?1.146405 ? ? ? ?NA > 2 ?1967-06-06 ?9.732887 ?1.146405 > 3 ?1967-06-07 -9.279462 ?9.732887 > 4 ?1967-06-08 ?7.856646 -9.279462 > 5 ?1967-06-09 ?5.494370 ?7.856646 > 6 ?1967-06-15 ?5.070176 ?NA > 7 ?1967-06-16 ?3.847314 ?5.070176 > 8 ?1967-06-17 -5.243094 ?3.847314 > 9 ?1967-06-18 ?9.396560 -5.243094 > 10 1967-06-19 ?4.112792 ?9.396560I've now gotten this far but have realized that my approach is flawed because if I increase the lag.by value to anything great than 1, an NA is no longer entered into the correct position. So here is my updated effort: lag.by <- function (data, lag.by) { tmp<-data.frame( ## Difference in days between dates diff=c(diff(data$Date), NA), lag.tmp=c(rep(NA,lag.by), head(data$Dis1, -lag.by)) ) ## Diff calculates difference to next row so all the difference ## values need to be lagged ifelse(c(rep(NA,lag.by), head(tmp$diff, -lag.by))<=1,tmp$lag.tmp,NA) } df$lag <- lag.by(df, lag.by=1) df$lag2 <- lag.by(df, lag.by=2); df Date Dis1 lag lag2 1 1967-06-05 1.146405 NA NA 2 1967-06-06 9.732887 1.146405 NA 3 1967-06-07 -9.279462 9.732887 1.146405 4 1967-06-08 7.856646 -9.279462 9.732887 5 1967-06-09 5.494370 7.856646 -9.279462 6 1967-06-15 5.070176 NA 7.856646 <- Need this to be a NA 7 1967-06-16 3.847314 5.070176 NA 8 1967-06-17 -5.243094 3.847314 5.070176 9 1967-06-18 9.396560 -5.243094 3.847314 10 1967-06-19 4.112792 9.396560 -5.243094 So, I should have NA's in the lag2 column at rows 6 and 7. Any help or thoughts would be much appreciated here.> > So can anyone recommend a way (either using my function or any other > approaches) that I might be able to consistently lag values based on a > lag.by value and consecutive dates? > > Thanks so much in advance! > > Sam