Paul Menzel
2011-Jul-27 23:36 UTC
[R] Is R the right choice for simulating first passage times of random walks?
Dear R folks, I need to simulate first passage times for iterated partial sums. The related papers are for example [1][2]. As a start I want to simulate how long a simple random walk stays negative, which should result that it behaves like n^(-?). My code looks like this. -------- 8< -------- code -------- >8 -------- n = 100000 # number of simulations length = 100000 # length of iterated sum z = rep(0, times=length + 1) for (i in 1:n) { x = c(0, sign(rnorm(length))) s = cumsum(x) for (i in 1:length) { if (s[i] < 0 && s[i + 1] >= 0) { z[i] = z[i] + 1 } } } plot(1:length(z), z/n) curve(x**(-0.5), add = TRUE) -------- 8< -------- code -------- >8 -------- This code already runs for over half an hour on my system?. Reading about the for loop [3] it says to try to avoid loops and I probably should use a matrix where every row is a sample. Now my first problem is that there is no matrix equivalent for `cumsum()`. Can I use matrices to avoid the for loop? My second question is, is R the right choice for such simulations? It would be great when R can also give me a confidence interval(?) and also try to fit a curve through the result and give me the rule of correspondence(?) [4]. Do you have pointers for those? I glanced at simFrame [5] and read `?simulate` but could not understand it right away and think that this might be overkill. Do you have any suggestions? Thanks, Paul ? AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps [2] http://arxiv.org/abs/0911.5456 [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics) [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110728/693bcad9/attachment.bin>
R. Michael Weylandt <michael.weylandt@gmail.com>
2011-Jul-27 23:59 UTC
[R] Is R the right choice for simulating first passage times of random walks?
Some more skilled folks can help with the curve fitting, but the general answer is yes -- R will handle this quite ably. Consider the following code: <<-------------------------------------->> n = 1e5 length = 1e5 R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n) R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired. <<---------------------------------------->> There are actually even faster ways to do what you are asking for, but this introduces you to some useful R architecture, above all the apply function. To see how long the longest stretch of negatives in each row is, the following is a little sneaky but works pretty well: countNegative = apply(R,1,function(x){which.max(table(cumsum(x>=0))}) then you can study these random numbers to do whatever with them. The gist of this code is that it counts how many positive number have been seen up to each point: for any stretch this doesn't increase, you must be negative, so this identifies the longest such stretch on each row and records the length. (It may be off by one so check it on a smaller R matrix. An empirical confidence interval might be given by this mu = mean(countNegative) sig = sd(countNegative)/sqrt(length(countNegative)) So all together: <<-------------------------------------->> n = 1e3 length = 1e3 R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n) R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired. fTemp <- function(x) { return(max(table(cumsum(x>=0)))) } countNegative = apply(R,1,fTemp) mu = mean(countNegative) sig = sd(countNegative)/sqrt(length(countNegative)) <<---------------------------------------->> This runs pretty fast on my laptop, but you'll need to look into the memory.limit() function if you want to increase the simulation parameters. There are much faster ways to handle the simulation as well, but this should get you off to a nice start with R. Hope this helps, Michael On Wed, Jul 27, 2011 at 7:36 PM, Paul Menzel < paulepanter@users.sourceforge.net> wrote:> Dear R folks, > > > I need to simulate first passage times for iterated partial sums. The > related papers are for example [1][2]. > > As a start I want to simulate how long a simple random walk stays > negative, which should result that it behaves like n^(-½). My code looks > like this. > > -------- 8< -------- code -------- >8 -------- > n = 100000 # number of simulations > > length = 100000 # length of iterated sum > z = rep(0, times=length + 1) > > for (i in 1:n) { > x = c(0, sign(rnorm(length))) > s = cumsum(x) > for (i in 1:length) { > if (s[i] < 0 && s[i + 1] >= 0) { > z[i] = z[i] + 1 > } > } > } > plot(1:length(z), z/n) > curve(x**(-0.5), add = TRUE) > -------- 8< -------- code -------- >8 -------- > > This code already runs for over half an hour on my system¹. > > Reading about the for loop [3] it says to try to avoid loops and I > probably should use a matrix where every row is a sample. > > Now my first problem is that there is no matrix equivalent for > `cumsum()`. Can I use matrices to avoid the for loop? > > My second question is, is R the right choice for such simulations? It > would be great when R can also give me a confidence interval(?) and also > try to fit a curve through the result and give me the rule of > correspondence(?) [4]. Do you have pointers for those? > > I glanced at simFrame [5] and read `?simulate` but could not understand > it right away and think that this might be overkill. > > Do you have any suggestions? > > > Thanks, > > Paul > > > ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz > > > [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps > [2] http://arxiv.org/abs/0911.5456 > [3] > http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution > [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics) > [5] > http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Paul Menzel
2011-Jul-28 00:00 UTC
[R] Is R the right choice for simulating first passage times of random walks?
Dear R folks, Am Donnerstag, den 28.07.2011, 01:36 +0200 schrieb Paul Menzel:> I need to simulate first passage times for iterated partial sums. The > related papers are for example [1][2]. > > As a start I want to simulate how long a simple random walk stays > negative, which should result that it behaves like n^(-?). My code looks > like this. > > -------- 8< -------- code -------- >8 -------- > n = 100000 # number of simulations > > length = 100000 # length of iterated sum > z = rep(0, times=length + 1) > > for (i in 1:n) { > x = c(0, sign(rnorm(length))) > s = cumsum(x) > for (i in 1:length) { > if (s[i] < 0 && s[i + 1] >= 0) { > z[i] = z[i] + 1 > } > } > } > plot(1:length(z), z/n) > curve(x**(-0.5), add = TRUE) > -------- 8< -------- code -------- >8 --------Of course the program above is not complete, because it only checks for the first passage from negativ to positive. `if (s[2] < 0) {}` should be added before the for loop.> This code already runs for over half an hour on my system?. > > Reading about the for loop [3] it says to try to avoid loops and I > probably should use a matrix where every row is a sample. > > Now my first problem is that there is no matrix equivalent for > `cumsum()`. Can I use matrices to avoid the for loop?I mean the inner for loop. Additionally I wonder if `cumsum` is really faster or if I should sum the elements by myself and check after every step if the walk gets non-negative/0. With a length of 1000000 this should save some cycles. On the other hand adding numbers should be really fast and adding checks in between could potentially be slower.> My second question is, is R the right choice for such simulations? It > would be great when R can also give me a confidence interval(?) and also > try to fit a curve through the result and give me the rule of > correspondence(?) [4]. Do you have pointers for those? > > I glanced at simFrame [5] and read `?simulate` but could not understand > it right away and think that this might be overkill. > > Do you have any suggestions?Thanks, Paul> ? AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz > > > [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps > [2] http://arxiv.org/abs/0911.5456 > [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution > [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics) > [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110728/d8a1313d/attachment.bin>
William Dunlap
2011-Jul-28 01:56 UTC
[R] Is R the right choice for simulating first passage times of random walks?
You can replace the inner loop for (i in 1:length) { if (s[i] < 0 && s[i + 1] >= 0) { z[i] = z[i] + 1 } } with the faster z <- z + (diff(s>=0)==1) (Using the latter forces you fix up the length of z to be one less than you declared -- your loop never touched the last entry in it.) My code relies on the factor that the logicals FALSE and TRUE are converted to integer 0 and 1, respectively, when you do arithmetic on them. E.g., here is your version written as a function, f1, and 2 equivalent ones, f2 and f3, without the inner loop: f1 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { z = rep(0, times=length) # orig had length+1 but last entry was never used for (i in 1:n) { x = c(0, sign(rnorm(length))) s = cumsum(x) for (i in 1:length) { if (s[i] < 0 && s[i + 1] >= 0) { z[i] = z[i] + 1 } } } z } f2 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { z = rep(0, times=length) l1 <- length+1 for (i in 1:n) { x = c(0, sign(rnorm(length))) s = cumsum(x) z <- z + ( s[-l1]<0 & s[-1]>=0 ) } z } f3 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { z = rep(0, times=length) for (i in 1:n) { x = c(0, sign(rnorm(length))) s = cumsum(x) z <- z + (diff(s>=0)==1) } z } and here are some timing and correctness tests: > set.seed(1) ; system.time( z1 <- f1(1000, 1e5) ) user system elapsed 278.10 0.25 271.44 > set.seed(1) ; system.time( z2 <- f2(1000, 1e5) ) user system elapsed 37.29 0.84 38.02 > set.seed(1) ; system.time( z3 <- f3(1000, 1e5) ) user system elapsed 40.11 0.80 40.17 > identical(z1, z2) && identical(z1, z3) [1] TRUE You might try using sample(c(-1,1), size=length, replace=TRUE) instead of sign(rnorm(length)). I don't know if there would be any speed difference, but it frees you from the worry that rnorm() might return an exact 0.0. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Paul Menzel > Sent: Wednesday, July 27, 2011 4:36 PM > To: r-help at r-project.org > Subject: [R] Is R the right choice for simulating first passage times of random walks? > > Dear R folks, > > > I need to simulate first passage times for iterated partial sums. The > related papers are for example [1][2]. > > As a start I want to simulate how long a simple random walk stays > negative, which should result that it behaves like n^(-?). My code looks > like this. > > -------- 8< -------- code -------- >8 -------- > n = 100000 # number of simulations > > length = 100000 # length of iterated sum > z = rep(0, times=length + 1) > > for (i in 1:n) { > x = c(0, sign(rnorm(length))) > s = cumsum(x) > for (i in 1:length) { > if (s[i] < 0 && s[i + 1] >= 0) { > z[i] = z[i] + 1 > } > } > } > plot(1:length(z), z/n) > curve(x**(-0.5), add = TRUE) > -------- 8< -------- code -------- >8 -------- > > This code already runs for over half an hour on my system?. > > Reading about the for loop [3] it says to try to avoid loops and I > probably should use a matrix where every row is a sample. > > Now my first problem is that there is no matrix equivalent for > `cumsum()`. Can I use matrices to avoid the for loop? > > My second question is, is R the right choice for such simulations? It > would be great when R can also give me a confidence interval(?) and also > try to fit a curve through the result and give me the rule of > correspondence(?) [4]. Do you have pointers for those? > > I glanced at simFrame [5] and read `?simulate` but could not understand > it right away and think that this might be overkill. > > Do you have any suggestions? > > > Thanks, > > Paul > > > ? AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz > > > [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps > [2] http://arxiv.org/abs/0911.5456 > [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution > [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics) > [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html