thr3ads.net - R help - [R] A question about sampling [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Patrick Boily

2011-Feb-02 20:02 UTC

[R] A question about sampling

Greetings,

I am attempting to do something with R that I think should be efficiently
do-able, but I haven't yet found success.

I have a vector of probability weights (for 17 categories), let's call it
things (it could look like the one below, for instance).
> things0.026 0 0.233 0 0.131 0 0.415 0 0 0 0 0 0.192 0 0.067 0 0

I'd like a sample of size size.things (say, 47) of the 17 categories (with
replacement). And I'd like to produce a vector of length 17 which enumerates
the number of times each category has been selected. This is fairly
straightforward to do; for instance:
>
things2<-table(factor(sample(1:17,size.things[1],replace=TRUE,prob=things),levels=1:17)) 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
 1  0  9  0  4  0 18  0  0  0  0  0  5  0  4  0  0

What would I need to do if I had a matrix things (50000 x 17) of probability
weight vectors and a vector of sample sizes size.things (of length 50000), and I
wanted to simultaneously sample size.things[1] of the 17 categories with
probability weight vector things[1,], size.things[2] of the 17 categories with
probability weight vector things[2,], etc. A loop will do the trick, but it
takes a while and it seems to me that I could more efficiently use tapply
somehow. Or something that behaves like rowSums. I'm not familiar enough
with R to see an easy way out. Perhaps there isn't? Does anybody have an
idea?

Regards,

Patrick








	[[alternative HTML version deleted]]

Greg Snow

2011-Feb-02 22:38 UTC

head link

[R] A question about sampling

The apply functions are really just hidden loops, and loops have been made
efficient enough that they are usually not much slower (and sometimes a bit
faster) than the apply's.

If you really want to use apply, then look at mapply (might need to convert the
matrix to a list), or you could use sapply on the vector 1:500 and write a
function that indexes into the matrix and vector.  But if you understand the
loop, then I would suggest using the loop.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Patrick Boily
> Sent: Wednesday, February 02, 2011 1:03 PM
> To: 'r-help at r-project.org'
> Subject: [R] A question about sampling
> 
> Greetings,
> 
> I am attempting to do something with R that I think should be
> efficiently do-able, but I haven't yet found success.
> 
> I have a vector of probability weights (for 17 categories), let's call
> it things (it could look like the one below, for instance).
> 
> > things
> 0.026 0 0.233 0 0.131 0 0.415 0 0 0 0 0 0.192 0 0.067 0 0
> 
> I'd like a sample of size size.things (say, 47) of the 17 categories
> (with replacement). And I'd like to produce a vector of length 17 which
> enumerates the number of times each category has been selected. This is
> fairly straightforward to do; for instance:
> 
> > things2<-
> table(factor(sample(1:17,size.things[1],replace=TRUE,prob=things),level
> s=1:17))
>  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
>  1  0  9  0  4  0 18  0  0  0  0  0  5  0  4  0  0
> 
> What would I need to do if I had a matrix things (50000 x 17) of
> probability weight vectors and a vector of sample sizes size.things (of
> length 50000), and I wanted to simultaneously sample size.things[1] of
> the 17 categories with probability weight vector things[1,],
> size.things[2] of the 17 categories with probability weight vector
> things[2,], etc. A loop will do the trick, but it takes a while and it
> seems to me that I could more efficiently use tapply somehow. Or
> something that behaves like rowSums. I'm not familiar enough with R to
> see an easy way out. Perhaps there isn't? Does anybody have an idea?
> 
> Regards,
> 
> Patrick
> 
> 
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Feb 2011 - A question about sampling

[R] A question about sampling

[R] A question about sampling

Possibly Parallel Threads