thr3ads.net - R help - [R] Efficient sampling from a discrete distribution in R [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Issac Trotts

2007-Sep-04 03:48 UTC

[R] Efficient sampling from a discrete distribution in R

Hello r-help,

As far as I've seen, there is no function in R dedicated to sampling
from a discrete distribution with a specified mass function.  The
standard library doesn't come with anything called rdiscrete or rpmf,
and I can't find any such thing on the cheat sheet or in the
Probability Distributions chapter of _An Introduction to R_.  Googling
also didn't bring back anything.  So, here's my first attempt at a
solution.  I'm hoping someone here knows of a more efficient way.

# Sample from a discrete distribution with given probability mass function
rdiscrete = function(size, pmf) {
  stopifnot(length(pmf) > 1)
  cmf = cumsum(pmf)
  icmf = function(p) {
    min(which(p < cmf))
  }
  ps = runif(size)
  sapply(ps, icmf)
}

test.rdiscrete = function(N = 10000) {
  err.tol = 6.0 / sqrt(N)
  xs = rdiscrete(N, c(0.5, 0.5))
  err = abs(sum(xs == 1) / N - 0.5)
  stopifnot(err < err.tol)
  list(e = err, xs = xs)
}

Thanks,
Issac

Prof Brian Ripley

2007-Sep-04 05:18 UTC

head link

[R] Efficient sampling from a discrete distribution in R

On Mon, 3 Sep 2007, Issac Trotts wrote:
> Hello r-help,
>
> As far as I've seen, there is no function in R dedicated to sampling
> from a discrete distribution with a specified mass function.  The
> standard library doesn't come with anything called rdiscrete or rpmf,
> and I can't find any such thing on the cheat sheet or in the
> Probability Distributions chapter of _An Introduction to R_.  Googling
> also didn't bring back anything.  So, here's my first attempt at a
> solution.  I'm hoping someone here knows of a more efficient way.
It's called sample().

There are much more efficient algorithms than the one you used, and 
sample() sometimes uses one of them (Walker's alias method): see any good 
book on simulation (including my 'Stochastic Simulation, 1987).
> # Sample from a discrete distribution with given probability mass function
> rdiscrete = function(size, pmf) {
>  stopifnot(length(pmf) > 1)
>  cmf = cumsum(pmf)
>  icmf = function(p) {
>    min(which(p < cmf))
>  }
>  ps = runif(size)
>  sapply(ps, icmf)
> }
>
> test.rdiscrete = function(N = 10000) {
>  err.tol = 6.0 / sqrt(N)
>  xs = rdiscrete(N, c(0.5, 0.5))
>  err = abs(sum(xs == 1) / N - 0.5)
>  stopifnot(err < err.tol)
>  list(e = err, xs = xs)
> }
>
> Thanks,
> Issac
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Issac Trotts

2007-Sep-04 07:40 UTC

head link

[R] Efficient sampling from a discrete distribution in R

On 9/3/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> Yes, vectorize, and don't grow vectors. See 'S Programming'
(details in
> the R FAQ).
It's easy to avoid growing the samples vector but it doesn't seem to
make much difference.  Vectorizing would be probably help, but in this
algorithm each step depends on the one before it.  Do you know of an
alternative to the Chinese restaurant process that is vectorizable?
> We've no idea about you (the R posting guide does ask for a signature
> block), but if this is homework some people are going to be annoyed.
No, it's for my job at Google, so maybe I should use my work email
address to avoid confusion.  Sorry if the questions are so basic that
they sound like homework, but you and others on the list know things
that aren't easy to find in the documentation.

-- 
Issac Trotts
Web Operations Chair
Silicon Valley Web Builder
http://svwebbuilder.com

Maybe Matching Threads

Search for more possibly parallel threads

R help - Sep 2007 - Efficient sampling from a discrete distribution in R

[R] Efficient sampling from a discrete distribution in R

[R] Efficient sampling from a discrete distribution in R

[R] Efficient sampling from a discrete distribution in R

Maybe Matching Threads