Dear newsgroup,
There was a recent post suggesting the incorporation of a standard
rmultinom(...). This seems like a good idea, but I wasn't sure about basing
this on tabulate( sample( ...)). Despite the attractive succinctness, this
could be very slow and use lots of memory if n or size is large. Instead,
I've tended to use a loop over the boxes of the multinomial, taking
successive binomial samples from the remaining available observations each
time.
To check, I did some timing comparisons. Unsurprisingly, the tabulate()
version is much slower for large "size" (number of observations per
box).
However, the successive binomial version is slower when observations are
sparse, presumably because it spends most of its time filling in zeros that
tabulate() doesn't worry about.
test> # Low dimension, many observations per box
test> length( ranvec)
[1] 7
test> system.time( sucbin.rmultinom( 1000, 1000, ranvec))
[1] 0.02 0.00 0.02 NA NA
test> system.time( tab.rmultinom( 1000, 1000, ranvec))
[1] 0.19 0.00 0.18 NA NA
test> # High dimension, few observations per box
test> length( ranvec.long)
[1] 1000
test> system.time( sucbin.rmultinom( 10000, 10, ranvec.long))
[1] 18.42 0.06 18.54 NA NA
test> system.time( tab.rmultinom( 10000, 10, ranvec.long))
[1] 0.97 0.10 1.06 NA NA
A good option for standard use might be a hybrid, with a "method"
parameter
saying which approach to use. Verrry roughly, this might default to "use
successive binomials if size > length( prob)". The optimal cutoff seems
to
have a more complex dependence on n, size, and p, but this default choice
should avoid the worst problems.
As a suggestion, I've included code for a hybrid. Underscore-haters may wish
to stop reading here :)
# first, two standard functions of mine: these could go inside or be coded
explicitly, of course
# Avoid unintended behaviour with : when 2nd arg is lower than 1st
assign( '%upto%', function( from, to) if( from <= to) from:to else
numeric(
0))
# Trim tail of vector
clip_ function( x, n=1) x[ 1 %upto% ( length( x) - n)]
rmultinom_ function( n, size, prob, method=if(k>size) "tabulate"
else
"sucbin") {
# returns a n X length( prob) matrix of integers, each row adding up to
size
k_ length( prob)
if( pmatch( method, c( 'sucbin', 'tabulate')) == 1) { #
successive
binomials
prob_ prob / sum( prob)
prob[-1]_ prob[-1] / (1-clip( cumsum( prob)))
x_ matrix( as.integer( 0), n, k)
nn_ rep( size, n)
for( i in 1 %upto% (k-1)) {
x[,i]_ as.integer( rbinom( n, size=nn, prob=prob[ i]))
nn_ nn - x[,i]
}
x[,k]_ nn
} else # tabulation
x_ matrix( tabulate( sample( k, n * size, repl = TRUE, prob) + k * 0
%upto% (n - 1), nbins = n * k),
nrow = n, ncol = k, byrow = TRUE) # NB using ":" instead of
"%upto%"
would fail for n=1
return( x)
}
cheers
Mark
*******************************
Mark Bravington
CSIRO (CMIS)
PO Box 1538
Castray Esplanade
Hobart
TAS 7001
phone (61) 3 6232 5118
fax (61) 3 6232 5012
Mark.Bravington at csiro.au
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._