Dimitri Liakhovitski
2011-Aug-04 15:24 UTC
[R] Efficient way of creating a shifted (lagged) variable?
Hello!
I have a data set:
set.seed(123)
y<-data.frame(week=seq(as.Date("2010-01-03"),
as.Date("2011-01-31"),by="week"))
y$var1<-c(1,2,3,round(rnorm(54),1))
y$var2<-c(10,20,30,round(rnorm(54),1))
# All I need is to create lagged variables for var1 and var2. I looked
around a bit and found several ways of doing it. They all seem quite
complicated - while in SPSS it's just a few letters (like LAG()). Here
is what I've written but I wonder. It works - but maybe there is a
very simple way of doing it in R that I could not find?
I need the same for "lead" (opposite of lag).
Any hint is greatly appreciated!
### The function I created:
mylag <- function(x,max.lag=1){ # x has to be a 1-column data frame
temp<-as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
for(i in 1:length(temp)){
names(temp)[i]<-paste(names(x),".lag",i,sep="")
}
return(temp)
}
### Running mylag to get my result:
myvars<-c("var1","var2")
for(i in myvars) {
y<-cbind(y,mylag(y[i]),max.lag=2)
}
(y)
--
Dimitri Liakhovitski
marketfusionanalytics.com
R. Michael Weylandt
2011-Aug-04 15:45 UTC
[R] Efficient way of creating a shifted (lagged) variable?
If you start looking at the time series classes (xts, zoo, etc) they have
very quick and flexible lag functions built in.
Might this be a slightly more efficient solution for the homebrew
implementation?
OurLag <- function(y, k=1) {
t = y[,1,drop=F]; d = y[,-1,drop=F];
if (is.matrix(y)) {rn = rownames(y); cn = colnames(y)} else {n names(y)}
if (k >= 1) {
d = rbind(matrix(NA,ncol=ncol(d), nrow = k), d)
d = d[1:nrow(t),]
} else {d = rbind(d[1:(nrow(d)+k),],matrix(NA,ncol=ncol(d), nrow = -k))}
ans = data.frame(t,d)
if (is.matrix(y)) {rownames(ans) = rn; colnames(ans) = cn} else
{names(ans) = n}
return(ans)
}
Set k<0 for lead instead of lag.
Michael Weylandt
On Thu, Aug 4, 2011 at 11:24 AM, Dimitri Liakhovitski <
dimitri.liakhovitski@gmail.com> wrote:
> Hello!
>
> I have a data set:
> set.seed(123)
> y<-data.frame(week=seq(as.Date("2010-01-03"),
> as.Date("2011-01-31"),by="week"))
> y$var1<-c(1,2,3,round(rnorm(54),1))
> y$var2<-c(10,20,30,round(rnorm(54),1))
>
> # All I need is to create lagged variables for var1 and var2. I looked
> around a bit and found several ways of doing it. They all seem quite
> complicated - while in SPSS it's just a few letters (like LAG()). Here
> is what I've written but I wonder. It works - but maybe there is a
> very simple way of doing it in R that I could not find?
> I need the same for "lead" (opposite of lag).
> Any hint is greatly appreciated!
>
> ### The function I created:
> mylag <- function(x,max.lag=1){ # x has to be a 1-column data frame
>
>
temp<-as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)]
> for(i in 1:length(temp)){
> names(temp)[i]<-paste(names(x),".lag",i,sep="")
> }
> return(temp)
> }
>
> ### Running mylag to get my result:
> myvars<-c("var1","var2")
> for(i in myvars) {
> y<-cbind(y,mylag(y[i]),max.lag=2)
> }
> (y)
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Dimitri Liakhovitski
2011-Aug-04 18:46 UTC
[R] Efficient way of creating a shifted (lagged) variable?
Thanks a lot, guys! It's really helpful. But - to be objective- it's still quite a few lines longer than in SPSS. Dimitri On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund <djnordlund at frontier.com> wrote:> > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> On Behalf Of Dimitri Liakhovitski >> Sent: Thursday, August 04, 2011 8:24 AM >> To: r-help >> Subject: [R] Efficient way of creating a shifted (lagged) variable? >> >> Hello! >> >> I have a data set: >> set.seed(123) >> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01- >> 31"),by="week")) >> y$var1<-c(1,2,3,round(rnorm(54),1)) >> y$var2<-c(10,20,30,round(rnorm(54),1)) >> >> # All I need is to create lagged variables for var1 and var2. I looked >> around a bit and found several ways of doing it. They all seem quite >> complicated - while in SPSS it's just a few letters (like LAG()). Here >> is what I've written but I wonder. It works - but maybe there is a >> very simple way of doing it in R that I could not find? >> I need the same for "lead" (opposite of lag). >> Any hint is greatly appreciated! >> >> ### The function I created: >> mylag <- function(x,max.lag=1){ ? # x has to be a 1-column data frame >> ? ?temp<- >> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] >> ? ?for(i in 1:length(temp)){ >> ? ? ?names(temp)[i]<-paste(names(x),".lag",i,sep="") >> ? ? } >> ? return(temp) >> } >> >> ### Running mylag to get my result: >> myvars<-c("var1","var2") >> for(i in myvars) { >> ? y<-cbind(y,mylag(y[i]),max.lag=2) >> } >> (y) >> >> -- >> Dimitri Liakhovitski >> marketfusionanalytics.com >> > > Dimitri, > > I would first look into the zoo package as has already been suggested. ?However, if you haven't already got your solution then here are a couple of functions that might help you get started. ?I won't vouch for efficiency. > > > lag.fun <- function(df, x, max.lag=1) { > ?for(i in x) { > ? ?for(j in 1:max.lag){ > ? ? ?lagx <- paste(i,'.lag',j,sep='') > ? ? ?df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i]) > ? ?} > ?} > ?df > } > > lead.fun <- function(df, x, max.lead=1) { > ?for(i in x) { > ? ?for(j in 1:max.lead){ > ? ? ?leadx <- paste(i,'.lead',j,sep='') > ? ? ?df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j)) > ? ?} > ?} > ?df > } > > y <- lag.fun(y,myvars,2) > y <- lead.fun(y,myvars,2) > > > Hope this is helpful, > > Dan > > Daniel Nordlund > Bothell, WA USA > > >-- Dimitri Liakhovitski marketfusionanalytics.com
Maybe Matching Threads
- cbind in aggregate formula - based on an existing object (vector)
- expand.grid on contents of a list
- "rounding" to a number that is LOWER than my number
- summing columns with NAs present
- Fastest way to compare a single value with all values in one column of a data frame