thr3ads.net - R help - [R] how to draw random numbers from many categorical distributions quickly? [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Sean Zhang

2011-Dec-15 06:06 UTC

[R] how to draw random numbers from many categorical distributions quickly?

Dear R helpers,

I have a question about drawing random numbers from many categorical
distributions.

Consider n individuals, each follows a categorical distribution defined
over k categories.
Consider a simple case in which n=4, k=3 as below

catDisMat <-
rbind(c(0.1,0.2,0.7),c(0.2,0.2,0.6),c(0.1,0.2,0.7),c(0.1,0.2,0.7))

outVec <- rep(NA,nrow(catDisMat))
for (i in 1:nrow(catDisMat)){
outVec[i] <- sample(1:3,1, prob=catDisMat[i,], replace = TRUE)
}

I can think of one way to potentially speed it up (in reality, my n is very
large, so speed matters). The approach above only samples 1 value each
time. I could have sampled two values for c(0.1,0.2,0.7) because it appears
three times. so by doing some manipulation, I think I can have the idea,
"sample(1:3, 3, prob=c(0.1,0.2,0.7), replace = TRUE)",  implemented to
improve speed a bit. But, I wonder whether there is a better approach for
speed?

Thanks in advance.

-Sean

	[[alternative HTML version deleted]]

Nordlund, Dan (DSHS/RDA)

2011-Dec-15 08:34 UTC

head link

[R] how to draw random numbers from many categorical distributions quickly?

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Sean Zhang
> Sent: Wednesday, December 14, 2011 10:07 PM
> To: r-help at r-project.org
> Subject: [R] how to draw random numbers from many categorical
> distributions quickly?
> 
> Dear R helpers,
> 
> I have a question about drawing random numbers from many categorical
> distributions.
> 
> Consider n individuals, each follows a categorical distribution defined
> over k categories.
> Consider a simple case in which n=4, k=3 as below
> 
> catDisMat <-
> rbind(c(0.1,0.2,0.7),c(0.2,0.2,0.6),c(0.1,0.2,0.7),c(0.1,0.2,0.7))
> 
> outVec <- rep(NA,nrow(catDisMat))
> for (i in 1:nrow(catDisMat)){
> outVec[i] <- sample(1:3,1, prob=catDisMat[i,], replace = TRUE)
> }
> 
> I can think of one way to potentially speed it up (in reality, my n is
> very
> large, so speed matters). The approach above only samples 1 value each
> time. I could have sampled two values for c(0.1,0.2,0.7) because it
> appears
> three times. so by doing some manipulation, I think I can have the
> idea,
> "sample(1:3, 3, prob=c(0.1,0.2,0.7), replace = TRUE)", 
implemented to
> improve speed a bit. But, I wonder whether there is a better approach
> for
> speed?
> 
> Thanks in advance.
> 
> -Sean
> 
Sean,

How about something like this:

outVec <- apply(catDisMat,1, function(x)sample(1:3, 1, prob = x, replace =
TRUE))


I created a catDisMat matrix with a million rows and apply crunched through it
in approximately 8-9 seconds on my 2.67 GHz 64-bit Windows 7 box with 12 GB of
ram.  Your code above was substantially slower.

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204

Seemingly Similar Threads

Search for more maybe matching threads

R help - Dec 2011 - how to draw random numbers from many categorical distributions quickly?

[R] how to draw random numbers from many categorical distributions quickly?

[R] how to draw random numbers from many categorical distributions quickly?

Seemingly Similar Threads