Dimitri Liakhovitski
2011-Aug-04 15:24 UTC
[R] Efficient way of creating a shifted (lagged) variable?
Hello! I have a data set: set.seed(123) y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-31"),by="week")) y$var1<-c(1,2,3,round(rnorm(54),1)) y$var2<-c(10,20,30,round(rnorm(54),1)) # All I need is to create lagged variables for var1 and var2. I looked around a bit and found several ways of doing it. They all seem quite complicated - while in SPSS it's just a few letters (like LAG()). Here is what I've written but I wonder. It works - but maybe there is a very simple way of doing it in R that I could not find? I need the same for "lead" (opposite of lag). Any hint is greatly appreciated! ### The function I created: mylag <- function(x,max.lag=1){ # x has to be a 1-column data frame temp<-as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] for(i in 1:length(temp)){ names(temp)[i]<-paste(names(x),".lag",i,sep="") } return(temp) } ### Running mylag to get my result: myvars<-c("var1","var2") for(i in myvars) { y<-cbind(y,mylag(y[i]),max.lag=2) } (y) -- Dimitri Liakhovitski marketfusionanalytics.com
R. Michael Weylandt
2011-Aug-04 15:45 UTC
[R] Efficient way of creating a shifted (lagged) variable?
If you start looking at the time series classes (xts, zoo, etc) they have very quick and flexible lag functions built in. Might this be a slightly more efficient solution for the homebrew implementation? OurLag <- function(y, k=1) { t = y[,1,drop=F]; d = y[,-1,drop=F]; if (is.matrix(y)) {rn = rownames(y); cn = colnames(y)} else {n names(y)} if (k >= 1) { d = rbind(matrix(NA,ncol=ncol(d), nrow = k), d) d = d[1:nrow(t),] } else {d = rbind(d[1:(nrow(d)+k),],matrix(NA,ncol=ncol(d), nrow = -k))} ans = data.frame(t,d) if (is.matrix(y)) {rownames(ans) = rn; colnames(ans) = cn} else {names(ans) = n} return(ans) } Set k<0 for lead instead of lag. Michael Weylandt On Thu, Aug 4, 2011 at 11:24 AM, Dimitri Liakhovitski < dimitri.liakhovitski@gmail.com> wrote:> Hello! > > I have a data set: > set.seed(123) > y<-data.frame(week=seq(as.Date("2010-01-03"), > as.Date("2011-01-31"),by="week")) > y$var1<-c(1,2,3,round(rnorm(54),1)) > y$var2<-c(10,20,30,round(rnorm(54),1)) > > # All I need is to create lagged variables for var1 and var2. I looked > around a bit and found several ways of doing it. They all seem quite > complicated - while in SPSS it's just a few letters (like LAG()). Here > is what I've written but I wonder. It works - but maybe there is a > very simple way of doing it in R that I could not find? > I need the same for "lead" (opposite of lag). > Any hint is greatly appreciated! > > ### The function I created: > mylag <- function(x,max.lag=1){ # x has to be a 1-column data frame > > temp<-as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] > for(i in 1:length(temp)){ > names(temp)[i]<-paste(names(x),".lag",i,sep="") > } > return(temp) > } > > ### Running mylag to get my result: > myvars<-c("var1","var2") > for(i in myvars) { > y<-cbind(y,mylag(y[i]),max.lag=2) > } > (y) > > -- > Dimitri Liakhovitski > marketfusionanalytics.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dimitri Liakhovitski
2011-Aug-04 18:46 UTC
[R] Efficient way of creating a shifted (lagged) variable?
Thanks a lot, guys! It's really helpful. But - to be objective- it's still quite a few lines longer than in SPSS. Dimitri On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund <djnordlund at frontier.com> wrote:> > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> On Behalf Of Dimitri Liakhovitski >> Sent: Thursday, August 04, 2011 8:24 AM >> To: r-help >> Subject: [R] Efficient way of creating a shifted (lagged) variable? >> >> Hello! >> >> I have a data set: >> set.seed(123) >> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01- >> 31"),by="week")) >> y$var1<-c(1,2,3,round(rnorm(54),1)) >> y$var2<-c(10,20,30,round(rnorm(54),1)) >> >> # All I need is to create lagged variables for var1 and var2. I looked >> around a bit and found several ways of doing it. They all seem quite >> complicated - while in SPSS it's just a few letters (like LAG()). Here >> is what I've written but I wonder. It works - but maybe there is a >> very simple way of doing it in R that I could not find? >> I need the same for "lead" (opposite of lag). >> Any hint is greatly appreciated! >> >> ### The function I created: >> mylag <- function(x,max.lag=1){ ? # x has to be a 1-column data frame >> ? ?temp<- >> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] >> ? ?for(i in 1:length(temp)){ >> ? ? ?names(temp)[i]<-paste(names(x),".lag",i,sep="") >> ? ? } >> ? return(temp) >> } >> >> ### Running mylag to get my result: >> myvars<-c("var1","var2") >> for(i in myvars) { >> ? y<-cbind(y,mylag(y[i]),max.lag=2) >> } >> (y) >> >> -- >> Dimitri Liakhovitski >> marketfusionanalytics.com >> > > Dimitri, > > I would first look into the zoo package as has already been suggested. ?However, if you haven't already got your solution then here are a couple of functions that might help you get started. ?I won't vouch for efficiency. > > > lag.fun <- function(df, x, max.lag=1) { > ?for(i in x) { > ? ?for(j in 1:max.lag){ > ? ? ?lagx <- paste(i,'.lag',j,sep='') > ? ? ?df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i]) > ? ?} > ?} > ?df > } > > lead.fun <- function(df, x, max.lead=1) { > ?for(i in x) { > ? ?for(j in 1:max.lead){ > ? ? ?leadx <- paste(i,'.lead',j,sep='') > ? ? ?df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j)) > ? ?} > ?} > ?df > } > > y <- lag.fun(y,myvars,2) > y <- lead.fun(y,myvars,2) > > > Hope this is helpful, > > Dan > > Daniel Nordlund > Bothell, WA USA > > >-- Dimitri Liakhovitski marketfusionanalytics.com
Maybe Matching Threads
- cbind in aggregate formula - based on an existing object (vector)
- expand.grid on contents of a list
- "rounding" to a number that is LOWER than my number
- summing columns with NAs present
- Fastest way to compare a single value with all values in one column of a data frame