I wish to simulate the following stochastic process, for i = 1...N individuals and t=1...T periods: y_{i,t} = y_0 + lambda Ey_{t-1} + epsilon_{i,t} where Ey_{t-1} is the average of y over the N individuals computed at time t-1. My solution (below) works but is incredibly slow. Is there a faster but still clear and readable alternative? Thanks a lot. Matteo rm(list=ls()) library(plyr) y0 = 0 lambda = 0.1 N = 20 T = 100 m_e = 0 sd_e = 1 # construct the data frame and initialize y D = data.frame( id = rep(1:N,T), t = rep(1:T, each = N), y = rep(y0,N*T) ) # update y for(t in 2:T){ ybar.L1 = mean(D[D$t==t-1,"y"]) for(i in 1:N){ epsilon = rnorm(1,mean=m_e,sd=sd_e) D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon } } ybar <- ddply(D,~t,summarise,mean=mean(y)) plot(ybar, col = "blue", type = "l") [[alternative HTML version deleted]]
Matteo, I tried your example code using R 3.1.1 on an iMac (24-inch, Early 2009), 3.06 GHz Intel Core 2 Duo, 8 GB 1333 MHz DDR3, NVIDIA GeForce GT 130 512 MB running Mac OS X 10.10 (Yosemite). After entering your code, the elapsed time from the time I hit return to when the graphics appeared was about 2 seconds ? is this about what you are seeing? Regards, Tom On Thu, Nov 6, 2014 at 7:47 AM, Matteo Richiardi <matteo.richiardi at gmail.com> wrote:> I wish to simulate the following stochastic process, for i = 1...N > individuals and t=1...T periods: > > y_{i,t} = y_0 + lambda Ey_{t-1} + epsilon_{i,t} > > where Ey_{t-1} is the average of y over the N individuals computed at time > t-1. > > My solution (below) works but is incredibly slow. Is there a faster but > still clear and readable alternative? > > Thanks a lot. Matteo > > rm(list=ls()) > library(plyr) > y0 = 0 > lambda = 0.1 > N = 20 > T = 100 > m_e = 0 > sd_e = 1 > > # construct the data frame and initialize y > D = data.frame( > id = rep(1:N,T), > t = rep(1:T, each = N), > y = rep(y0,N*T) > ) > > # update y > for(t in 2:T){ > ybar.L1 = mean(D[D$t==t-1,"y"]) > for(i in 1:N){ > epsilon = rnorm(1,mean=m_e,sd=sd_e) > D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon > } > } > > ybar <- ddply(D,~t,summarise,mean=mean(y)) > > plot(ybar, col = "blue", type = "l") > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Matteo, Ah ? OK, N=20, I did not catch that. You have nested for loops, which R is known to be exceedingly slow at handling ? if you can reorganize the code to eliminate the loops, your performance will increase significantly. Tom On Thu, Nov 6, 2014 at 7:47 AM, Matteo Richiardi <matteo.richiardi at gmail.com> wrote:> I wish to simulate the following stochastic process, for i = 1...N > individuals and t=1...T periods: > > y_{i,t} = y_0 + lambda Ey_{t-1} + epsilon_{i,t} > > where Ey_{t-1} is the average of y over the N individuals computed at time > t-1. > > My solution (below) works but is incredibly slow. Is there a faster but > still clear and readable alternative? > > Thanks a lot. Matteo > > rm(list=ls()) > library(plyr) > y0 = 0 > lambda = 0.1 > N = 20 > T = 100 > m_e = 0 > sd_e = 1 > > # construct the data frame and initialize y > D = data.frame( > id = rep(1:N,T), > t = rep(1:T, each = N), > y = rep(y0,N*T) > ) > > # update y > for(t in 2:T){ > ybar.L1 = mean(D[D$t==t-1,"y"]) > for(i in 1:N){ > epsilon = rnorm(1,mean=m_e,sd=sd_e) > D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon > } > } > > ybar <- ddply(D,~t,summarise,mean=mean(y)) > > plot(ybar, col = "blue", type = "l") > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
I find that representing the simulated data as a T row by N column matrix allows for a clearer and faster simulation function. E.g., compare the output of the following two functions, the first of which uses your code and the second a matrix representation (which I convert to a data.frame at the end so I can compare outputs easily). I timed both of them for T=10^3 times and N=50 individuals; both gave the same results and f1 was 10000 times faster than f0: > set.seed(1); t0 <- system.time(s0 <- f0(N=50,T=1000)) > set.seed(1); t1 <- system.time(s1 <- f1(N=50,T=1000)) > rbind(t0, t1) user.self sys.self elapsed user.child sys.child t0 436.87 0.11 438.48 NA NA t1 0.04 0.00 0.04 NA NA > all.equal(s0, s1) [1] TRUE The functions are: f0 <- function(N = 20, T = 100, lambda = 0.1, m_e = 0, sd_e = 1, y0 = 0) { # construct the data frame and initialize y D <- data.frame( id = rep(1:N,T), t = rep(1:T, each = N), y = rep(y0,N*T) ) # update y for(t in 2:T){ ybar.L1 = mean(D[D$t==t-1,"y"]) for(i in 1:N){ epsilon = rnorm(1,mean=m_e,sd=sd_e) D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon } } D } f1 <- function(N = 20, T = 100, lambda = 0.1, m_e = 0, sd_e = 1, y0 = 0) { # same process simulated using a matrix representation # The T rows are times, the N columns are individuals M <- matrix(y0, nrow=T, ncol=N) if (T > 1) for(t in 2:T) { ybar.L1 <- mean(M[t-1L,]) epsilon <- rnorm(N, mean=m_e, sd=sd_e) M[t,] <- lambda * y0 + (1-lambda)*ybar.L1 + epsilon } # convert to the data.frame representation that f0 uses tM <- t(M) data.frame(id = as.vector(row(tM)), t = as.vector(col(tM)), y as.vector(tM)) } Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Nov 6, 2014 at 6:47 AM, Matteo Richiardi <matteo.richiardi at gmail.com> wrote:> I wish to simulate the following stochastic process, for i = 1...N > individuals and t=1...T periods: > > y_{i,t} = y_0 + lambda Ey_{t-1} + epsilon_{i,t} > > where Ey_{t-1} is the average of y over the N individuals computed at time > t-1. > > My solution (below) works but is incredibly slow. Is there a faster but > still clear and readable alternative? > > Thanks a lot. Matteo > > rm(list=ls()) > library(plyr) > y0 = 0 > lambda = 0.1 > N = 20 > T = 100 > m_e = 0 > sd_e = 1 > > # construct the data frame and initialize y > D = data.frame( > id = rep(1:N,T), > t = rep(1:T, each = N), > y = rep(y0,N*T) > ) > > # update y > for(t in 2:T){ > ybar.L1 = mean(D[D$t==t-1,"y"]) > for(i in 1:N){ > epsilon = rnorm(1,mean=m_e,sd=sd_e) > D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon > } > } > > ybar <- ddply(D,~t,summarise,mean=mean(y)) > > plot(ybar, col = "blue", type = "l") > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]