Hi All, I need to sample a vector ("old"), with replacement, up to the point where my vector of samples ("new") sums to a predefined value ("target"), shortening the last sample if necessary so that the total sum ("newsum") of the samples matches the predefined value. While I can easily do this with a "while" loop (see below for example code), because the length of both "old" and "new" may be > 20,000, a vectorized approach will save me lots of CPU time. Any suggestions would be greatly appreciated. Thanks, Dan # loop approach old=c(1:10) p=runif(1:10) target=20 newsum=0 new=NULL while (newsum<target) { i=sample(old, size=1, prob=p); new[length(new)+1]=i; newsum=sum(new) } new newsum target if(newsum>target){new[length(new)]=target-sum(new[-length(new)])} new newsum=sum(new); newsum target -- Daniel E. Bunker Associate Coordinator - BioMERGE Post-Doctoral Research Scientist Columbia University Department of Ecology, Evolution and Environmental Biology 1020 Schermerhorn Extension 1200 Amsterdam Avenue New York, NY 10027-5557 212-854-9881 212-854-8188 fax deb37ATcolumbiaDOTedu
On 07-Apr-05 Daniel E. Bunker wrote:> Hi All, > > I need to sample a vector ("old"), with replacement, up to the point > where my vector of samples ("new") sums to a predefined value > ("target"), shortening the last sample if necessary so that the total > sum ("newsum") of the samples matches the predefined value. > > While I can easily do this with a "while" loop (see below for example > code), because the length of both "old" and "new" may be > 20,000, a > vectorized approach will save me lots of CPU time. > > Any suggestions would be greatly appreciated. > > Thanks, DanHi Dan, You should be able to adapt the following vectorised approach to your specific needs: old<-0.001*(1:1000) new<-sample(old,10000,replace=TRUE,prob=p) target<-200 min(which(cumsum(new)>target)) ## [1] 385 This took only a fraction of a second on my medium-speed machine. If you get an "Inf" as result, then 'new' doesn't add up to 'target', so you have to extend it. Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 07-Apr-05 Time: 22:46:12 ------------------------------ XFMail ------------------------------
Hi, sample() takes a "replace" argument, so you can take large samples, with replacement, like this: (In the sample() call, the 50*target/mean(old) should make it sample 50 times more than likely. This means the while loop will probably get executed only once. This could be tuned easily, and there may be better ways of guessing how much to take). old <- c(1:2000) p <- runif(1:2000) target <- 4000 new <- 0 while ( sum(new) < target ) new <- sample(old, 50*target/mean(old), TRUE, p) i <- which(cumsum(new) >= target)[1] new <- new[1:i] new[i] <- new[i] - (sum(new)-target) Cheers, Rich On Apr 8, 2005 9:19 AM, Daniel E. Bunker <deb37 at columbia.edu> wrote:> Hi All, > > I need to sample a vector ("old"), with replacement, up to the point > where my vector of samples ("new") sums to a predefined value > ("target"), shortening the last sample if necessary so that the total > sum ("newsum") of the samples matches the predefined value. > > While I can easily do this with a "while" loop (see below for example > code), because the length of both "old" and "new" may be > 20,000, a > vectorized approach will save me lots of CPU time. > > Any suggestions would be greatly appreciated. > > Thanks, Dan > > # loop approach > old=c(1:10) > p=runif(1:10) > target=20 > > newsum=0 > new=NULL > while (newsum<target) { > i=sample(old, size=1, prob=p); > new[length(new)+1]=i; > newsum=sum(new) > } > new > newsum > target > if(newsum>target){new[length(new)]=target-sum(new[-length(new)])} > new > newsum=sum(new); newsum > target >-- Rich FitzJohn rich.fitzjohn <at> gmail.com | http://homepages.paradise.net.nz/richa183 You are in a maze of twisty little functions, all alike