Theo Borm
2007-Mar-30 14:10 UTC
[R] faster computation of cumulative multinomial distribution
Dear list members, I have a series of /unequal/ probabilities [p1,p2,...,pk], describing mutually exclusive events, and a "remainder" class with a probability p0=1-p1-p2-....-pk, and need to calculate, for a given number of trials t>=k, the combined probability that each of the classes 1...k contains at least 1 "event" (the remainder class may be empty). To me this reaks of a sum of multinomial distributions, and indeed, I can readily calculate the correct answer for small figures t,k using a small R program. However, in my typical experiment, k ranges from ~20-60 and t from ~40-100, and having to calculate these for about 6e9 experiments, a quick calculation on the back of a napkin shows me that I will not be able to complete these calculations before the expected end of the universe. I already figured out that in this particular case I may use reciprocal probabilities, and if I do this I get an equation with "only" 2^k terms, which would shorten my computations to a few decades. Isn't there a faster (numerical approximation?) way to do this? R has dbinom /and/ pbinom functions, but unfortunately only a dmultinom and no pmultinom function... perhaps because there is no (known) faster way? with kind regards, Theo Borm
Alberto Monteiro
2007-Mar-30 22:00 UTC
[R] faster computation of cumulative multinomial distribution
Theo Borm wrote:> > I have a series of /unequal/ probabilities [p1,p2,...,pk], describing > mutually exclusive events, and a "remainder" class with a probability > p0=1-p1-p2-....-pk, and need to calculate, for a given number of trials > t>=k, the combined probability that each of the classes 1...k > contains at least 1 "event" (the remainder class may be empty). > > To me this reaks of a sum of multinomial distributions, and indeed, I > can readily calculate the correct answer for small figures t,k using > a small R program. >The multinomial distribution is a sum of independent cathegorial distributions, so there's no such thing as a sum of multinomial distributions - unless the sum of multinomial distributions is also a multinomial distribution. Yikes. I think I am saying too much. Just look here: http://en.wikipedia.org/wiki/Multinomial_distribution> However, in my typical experiment, k ranges from ~20-60 and t from > ~40-100, and having to calculate these for about 6e9 experiments, a > quick calculation on the back of a napkin shows me that I will not be > able to complete these calculations before the expected end > of the universe. >But what do you want to calculate? The probabilities?> I already figured out that in this particular case I may use reciprocal > probabilities, and if I do this I get an equation with "only" 2^k > terms, which would shorten my computations to a few decades. > > Isn't there a faster (numerical approximation?) way to do this? >I don't know what you want to do, but maybe this is the case where a Monte Carlo Simulation would give a fast and accurate result. Let's say you want to estimate some function of the outcome of this multinomial. Just to fix ideas: # p - the probabilities p <- c(1, 2, 3, 2, 4, 4, 2, 1, 2, 3, 1, 2, 3, 4, 3, 2, 1, 2) # normalize so that sum(p) = 1 p <- p / sum(p) # get k k <- length(p) # number of trials, say 100 t <- 100 # x is the simulation for each trial. x[1] is the number of # times we got the thing with probability p[1], etc # # x <- rmultinom(1, k, p) # simulates 1 experiment # # f is the function that you will use to "evaluate" x # f <- function(x) { return(sum(x * 1.05^-(0:(length(x)-1)))) # anything can come here } # # A Monte Carlo simulation will generate "a lot" of xs and # get a distribution of f(x) # # Simulate 10000 xs x <- rmultinom(10000, k, p) # takes almost zero time # Evaluate 10000 f(x). apply(matrix, 1 (rows) or 2 (coluns), function) f.dist <- apply(x, 2, f) # analyse it hist(f.dist) # etc> R has dbinom /and/ pbinom functions, but unfortunately only a dmultinom > and no pmultinom function... perhaps because there is no (known) > faster way? >There's a rmultinom Alberto Monteiro
Possibly Parallel Threads
- Troubles with the function rmultinom.c of the R's Random Number Generator
- Error message for infinite probability parameters in rbinom() and rmultinom()
- Error message for infinite probability parameters in rbinom() and rmultinom()
- Error message for infinite probability parameters in rbinom() and rmultinom()
- rmultinom.c error probability not sum to 1