Giovanni Petris
2009-Jul-29  18:32 UTC
[R] Systematic resampling (in sequential Monte Carlo)
Dear all, Here is a little coding problem. It falls in the category of "how can I do this efficiently?" rather than "how can I do this?" (I know how to do it inefficiently). So, if you want to take the challenge, keep reading, otherwise just skip to the next post - I won't be offended by that ;-) I am coding a particle filter and I want to use systematic resampling to "regenerate" particles. Basically, what this means is the following. 1. Start with a vector of "labels", x, say, of length N and a vector of probabilities, w, say - w[i] is the probability attached to x[i]. 2. Draw N equally spaced random number in (0,1) as u = (1 : N + runif(1)) / N. 3. Define a new sample of "labels", y, say, by inverting the cdf defined by the probabilities w. That is, set y[i] = x[k] if cumsum(w)[k-1] <= u[i] < cumsum(w)[k], i=1, ..., N. The following is a piece of R code that implements the procedure.> ### Systematic resampling > set.seed(2) > x <- LETTERS[1 : 6] # labels > w <- rexp(6) > w <- w / sum(w) # probabilities > W <- c(0, cumsum(w)) # cdf > u <- (1 : 6 + runif(1)) / 6 > indNew <- sapply(seq_along(u),+ function(i) (sum(W[i] <= u & u < W[i + 1])))> (y <- rep(x, indNew))[1] "A" "B" "D" "D" "F"> ## graphically... > plot(stepfun(seq_along(u), W), xaxt = "n") > axis(1, at = seq_along(u), x) > abline(h = u, lty = "dotted")The question is the following. I believe the way I compute "newInd" is of order N^2. Is there a smart way of doing it which is of order N? (I know there is one, I just cannot find it...) Thanks in advance, Giovanni
Dear all, This is the sequel to a question of mine a couple of days ago about how to implement systematic resampling from a discrete distribution efficiently. (See below the original question and description of the problem - the problem, in a nutshell, is how to draw N points from a discrete distribution on N points). I did my homework and found an implementation of systematic resampling which is O(N) - my initial one was O(N^2). The code is given below. However, compared with multinomial resampling, the difference in execution time is huge. Is there anybody out there who sees a way of improving the efficiency of my code?? Is coding it in C my only option? Thank you in advance for your suggestions and insight, Giovanni ######### R example #########> ### Timing systematic resampling > set.seed(2) > N <- 1e6 > x <- LETTERS[1 : N] # labels > w <- rexp(N) > w <- w / sum(w) # probabilities > W <- cumsum(w) # cdf > system.time({+ u <- (0 : (N - 1) + runif(1)) / N + indexNew <- rep(0, N) + h <- 1 + k <- 1 + for (i in 1 : N) + { + j <- h + while (u[h] < W[k] && h <= N) h <- h + 1 + indexNew[i] <- h - j + k <- k + 1 + } + (y <- rep(x, indexNew)) + }) user system elapsed 12.791 0.073 12.887> ### ...while using multinomial resampling: > system.time(+ index <- sample(N, N, replace = TRUE, prob = w) + ) user system elapsed 0.344 0.020 0.366> Date: Wed, 29 Jul 2009 13:32:14 -0500 (CDT) > From: Giovanni Petris <GPetris at uark.edu> > Sender: r-help-bounces at r-project.org > Precedence: list > > > > Dear all, > > Here is a little coding problem. It falls in the category of "how can I do > this efficiently?" rather than "how can I do this?" (I know how to do it > inefficiently). So, if you want to take the challenge, keep reading, otherwise > just skip to the next post - I won't be offended by that ;-) > > I am coding a particle filter and I want to use systematic resampling to > "regenerate" particles. Basically, what this means is the following. > > 1. Start with a vector of "labels", x, say, of length N and a vector of > probabilities, w, say - w[i] is the probability attached to x[i]. > > 2. Draw N equally spaced random number in (0,1) as u = (1 : N + runif(1)) / N. > > 3. Define a new sample of "labels", y, say, by inverting the cdf defined by > the probabilities w. That is, set > y[i] = x[k] if cumsum(w)[k-1] <= u[i] < cumsum(w)[k], i=1, ..., N. > > The following is a piece of R code that implements the procedure. > > > ### Systematic resampling > > set.seed(2) > > x <- LETTERS[1 : 6] # labels > > w <- rexp(6) > > w <- w / sum(w) # probabilities > > W <- c(0, cumsum(w)) # cdf > > u <- (1 : 6 + runif(1)) / 6 > > indNew <- sapply(seq_along(u), > + function(i) (sum(W[i] <= u & u < W[i + 1]))) > > (y <- rep(x, indNew)) > [1] "A" "B" "D" "D" "F" > > ## graphically... > > plot(stepfun(seq_along(u), W), xaxt = "n") > > axis(1, at = seq_along(u), x) > > abline(h = u, lty = "dotted") > > The question is the following. I believe the way I compute "newInd" is of > order N^2. Is there a smart way of doing it which is of order N? (I know there > is one, I just cannot find it...) > > Thanks in advance, > Giovanni > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Possibly Parallel Threads
- New package RcppSMC 0.1.0 for Sequential Monte Carlo and Particle Filters
- New package RcppSMC 0.1.0 for Sequential Monte Carlo and Particle Filters
- R package: Mchtest - Monte Carlo hypothesis testing allowing Sequential Stopping
- R package: Mchtest - Monte Carlo hypothesis testing allowing Sequential Stopping
- VaR-Monte carlo Simulation, Historic simulation, Variance-Covariance Simulation