Paul Menzel
2011-Jul-27 23:36 UTC
[R] Is R the right choice for simulating first passage times of random walks?
Dear R folks,
I need to simulate first passage times for iterated partial sums. The
related papers are for example [1][2].
As a start I want to simulate how long a simple random walk stays
negative, which should result that it behaves like n^(-?). My code looks
like this.
-------- 8< -------- code -------- >8 --------
n = 100000 # number of simulations
length = 100000 # length of iterated sum
z = rep(0, times=length + 1)
for (i in 1:n) {
x = c(0, sign(rnorm(length)))
s = cumsum(x)
for (i in 1:length) {
if (s[i] < 0 && s[i + 1] >= 0) {
z[i] = z[i] + 1
}
}
}
plot(1:length(z), z/n)
curve(x**(-0.5), add = TRUE)
-------- 8< -------- code -------- >8 --------
This code already runs for over half an hour on my system?.
Reading about the for loop [3] it says to try to avoid loops and I
probably should use a matrix where every row is a sample.
Now my first problem is that there is no matrix equivalent for
`cumsum()`. Can I use matrices to avoid the for loop?
My second question is, is R the right choice for such simulations? It
would be great when R can also give me a confidence interval(?) and also
try to fit a curve through the result and give me the rule of
correspondence(?) [4]. Do you have pointers for those?
I glanced at simFrame [5] and read `?simulate` but could not understand
it right away and think that this might be overkill.
Do you have any suggestions?
Thanks,
Paul
? AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz
[1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
[2] http://arxiv.org/abs/0911.5456
[3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
[4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
[5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20110728/693bcad9/attachment.bin>
R. Michael Weylandt <michael.weylandt@gmail.com>
2011-Jul-27 23:59 UTC
[R] Is R the right choice for simulating first passage times of random walks?
Some more skilled folks can help with the curve fitting, but the general
answer is yes -- R will handle this quite ably.
Consider the following code:
<<-------------------------------------->>
n = 1e5
length = 1e5
R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
your life INFINITELY better
R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.
<<---------------------------------------->>
There are actually even faster ways to do what you are asking for, but this
introduces you to some useful R architecture, above all the apply function.
To see how long the longest stretch of negatives in each row is, the
following is a little sneaky but works pretty well:
countNegative = apply(R,1,function(x){which.max(table(cumsum(x>=0))})
then you can study these random numbers to do whatever with them.
The gist of this code is that it counts how many positive number have been
seen up to each point: for any stretch this doesn't increase, you must be
negative, so this identifies the longest such stretch on each row and
records the length. (It may be off by one so check it on a smaller R matrix.
An empirical confidence interval might be given by this
mu = mean(countNegative)
sig = sd(countNegative)/sqrt(length(countNegative))
So all together:
<<-------------------------------------->>
n = 1e3
length = 1e3
R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
your life INFINITELY better
R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.
fTemp <- function(x) {
return(max(table(cumsum(x>=0))))
}
countNegative = apply(R,1,fTemp)
mu = mean(countNegative)
sig = sd(countNegative)/sqrt(length(countNegative))
<<---------------------------------------->>
This runs pretty fast on my laptop, but you'll need to look into the
memory.limit() function if you want to increase the simulation parameters.
There are much faster ways to handle the simulation as well, but this should
get you off to a nice start with R.
Hope this helps,
Michael
On Wed, Jul 27, 2011 at 7:36 PM, Paul Menzel <
paulepanter@users.sourceforge.net> wrote:
> Dear R folks,
>
>
> I need to simulate first passage times for iterated partial sums. The
> related papers are for example [1][2].
>
> As a start I want to simulate how long a simple random walk stays
> negative, which should result that it behaves like n^(-½). My code looks
> like this.
>
> -------- 8< -------- code -------- >8 --------
> n = 100000 # number of simulations
>
> length = 100000 # length of iterated sum
> z = rep(0, times=length + 1)
>
> for (i in 1:n) {
> x = c(0, sign(rnorm(length)))
> s = cumsum(x)
> for (i in 1:length) {
> if (s[i] < 0 && s[i + 1] >= 0) {
> z[i] = z[i] + 1
> }
> }
> }
> plot(1:length(z), z/n)
> curve(x**(-0.5), add = TRUE)
> -------- 8< -------- code -------- >8 --------
>
> This code already runs for over half an hour on my system¹.
>
> Reading about the for loop [3] it says to try to avoid loops and I
> probably should use a matrix where every row is a sample.
>
> Now my first problem is that there is no matrix equivalent for
> `cumsum()`. Can I use matrices to avoid the for loop?
>
> My second question is, is R the right choice for such simulations? It
> would be great when R can also give me a confidence interval(?) and also
> try to fit a curve through the result and give me the rule of
> correspondence(?) [4]. Do you have pointers for those?
>
> I glanced at simFrame [5] and read `?simulate` but could not understand
> it right away and think that this might be overkill.
>
> Do you have any suggestions?
>
>
> Thanks,
>
> Paul
>
>
> ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz
>
>
> [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
> [2] http://arxiv.org/abs/0911.5456
> [3]
> http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
> [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
> [5]
> http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]
Paul Menzel
2011-Jul-28 00:00 UTC
[R] Is R the right choice for simulating first passage times of random walks?
Dear R folks, Am Donnerstag, den 28.07.2011, 01:36 +0200 schrieb Paul Menzel:> I need to simulate first passage times for iterated partial sums. The > related papers are for example [1][2]. > > As a start I want to simulate how long a simple random walk stays > negative, which should result that it behaves like n^(-?). My code looks > like this. > > -------- 8< -------- code -------- >8 -------- > n = 100000 # number of simulations > > length = 100000 # length of iterated sum > z = rep(0, times=length + 1) > > for (i in 1:n) { > x = c(0, sign(rnorm(length))) > s = cumsum(x) > for (i in 1:length) { > if (s[i] < 0 && s[i + 1] >= 0) { > z[i] = z[i] + 1 > } > } > } > plot(1:length(z), z/n) > curve(x**(-0.5), add = TRUE) > -------- 8< -------- code -------- >8 --------Of course the program above is not complete, because it only checks for the first passage from negativ to positive. `if (s[2] < 0) {}` should be added before the for loop.> This code already runs for over half an hour on my system?. > > Reading about the for loop [3] it says to try to avoid loops and I > probably should use a matrix where every row is a sample. > > Now my first problem is that there is no matrix equivalent for > `cumsum()`. Can I use matrices to avoid the for loop?I mean the inner for loop. Additionally I wonder if `cumsum` is really faster or if I should sum the elements by myself and check after every step if the walk gets non-negative/0. With a length of 1000000 this should save some cycles. On the other hand adding numbers should be really fast and adding checks in between could potentially be slower.> My second question is, is R the right choice for such simulations? It > would be great when R can also give me a confidence interval(?) and also > try to fit a curve through the result and give me the rule of > correspondence(?) [4]. Do you have pointers for those? > > I glanced at simFrame [5] and read `?simulate` but could not understand > it right away and think that this might be overkill. > > Do you have any suggestions?Thanks, Paul> ? AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz > > > [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps > [2] http://arxiv.org/abs/0911.5456 > [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution > [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics) > [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110728/d8a1313d/attachment.bin>
William Dunlap
2011-Jul-28 01:56 UTC
[R] Is R the right choice for simulating first passage times of random walks?
You can replace the inner loop
for (i in 1:length) {
if (s[i] < 0 && s[i + 1] >= 0) {
z[i] = z[i] + 1
}
}
with the faster
z <- z + (diff(s>=0)==1)
(Using the latter forces you fix up the length of
z to be one less than you declared -- your loop never
touched the last entry in it.)
My code relies on the factor that the logicals FALSE
and TRUE are converted to integer 0 and 1, respectively,
when you do arithmetic on them.
E.g., here is your version written as a function, f1, and
2 equivalent ones, f2 and f3, without the inner loop:
f1 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
z = rep(0, times=length) # orig had length+1 but last entry was never used
for (i in 1:n) {
x = c(0, sign(rnorm(length)))
s = cumsum(x)
for (i in 1:length) {
if (s[i] < 0 && s[i + 1] >= 0) {
z[i] = z[i] + 1
}
}
}
z
}
f2 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
z = rep(0, times=length)
l1 <- length+1
for (i in 1:n) {
x = c(0, sign(rnorm(length)))
s = cumsum(x)
z <- z + ( s[-l1]<0 & s[-1]>=0 )
}
z
}
f3 <- function(n = 100000, # number of simulations
length = 100000) # length of iterated sum
{
z = rep(0, times=length)
for (i in 1:n) {
x = c(0, sign(rnorm(length)))
s = cumsum(x)
z <- z + (diff(s>=0)==1)
}
z
}
and here are some timing and correctness tests:
> set.seed(1) ; system.time( z1 <- f1(1000, 1e5) )
user system elapsed
278.10 0.25 271.44
> set.seed(1) ; system.time( z2 <- f2(1000, 1e5) )
user system elapsed
37.29 0.84 38.02
> set.seed(1) ; system.time( z3 <- f3(1000, 1e5) )
user system elapsed
40.11 0.80 40.17
> identical(z1, z2) && identical(z1, z3)
[1] TRUE
You might try using sample(c(-1,1), size=length, replace=TRUE)
instead of sign(rnorm(length)). I don't know if there would
be any speed difference, but it frees you from the worry that
rnorm() might return an exact 0.0.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf Of Paul Menzel
> Sent: Wednesday, July 27, 2011 4:36 PM
> To: r-help at r-project.org
> Subject: [R] Is R the right choice for simulating first passage times of
random walks?
>
> Dear R folks,
>
>
> I need to simulate first passage times for iterated partial sums. The
> related papers are for example [1][2].
>
> As a start I want to simulate how long a simple random walk stays
> negative, which should result that it behaves like n^(-?). My code looks
> like this.
>
> -------- 8< -------- code -------- >8 --------
> n = 100000 # number of simulations
>
> length = 100000 # length of iterated sum
> z = rep(0, times=length + 1)
>
> for (i in 1:n) {
> x = c(0, sign(rnorm(length)))
> s = cumsum(x)
> for (i in 1:length) {
> if (s[i] < 0 && s[i + 1] >= 0) {
> z[i] = z[i] + 1
> }
> }
> }
> plot(1:length(z), z/n)
> curve(x**(-0.5), add = TRUE)
> -------- 8< -------- code -------- >8 --------
>
> This code already runs for over half an hour on my system?.
>
> Reading about the for loop [3] it says to try to avoid loops and I
> probably should use a matrix where every row is a sample.
>
> Now my first problem is that there is no matrix equivalent for
> `cumsum()`. Can I use matrices to avoid the for loop?
>
> My second question is, is R the right choice for such simulations? It
> would be great when R can also give me a confidence interval(?) and also
> try to fit a curve through the result and give me the rule of
> correspondence(?) [4]. Do you have pointers for those?
>
> I glanced at simFrame [5] and read `?simulate` but could not understand
> it right away and think that this might be overkill.
>
> Do you have any suggestions?
>
>
> Thanks,
>
> Paul
>
>
> ? AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz
>
>
> [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
> [2] http://arxiv.org/abs/0911.5456
> [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
> [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
> [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html