This is in followup to a thread on R-help with subject "Sampling". I claim that R does the wrong thing by default when sampling with unequal probabilities without replacement - the selection probabilities are not proportional to 'prob', for any draw after the first: I suggest that R do what S-PLUS now does (though you're free to choose a better implementation). What S-PLUS now does is: If 'replace==TRUE' then sample with replacement. Otherwise, sample without replacement or with minimal replacement, according to the value of argument 'minimal'. The default is 'minimal = length(prob) > 1'. One can specify 'minimal = FALSE' for backward compatibility. In the case of sampling with minimal replacement, duplicates may occur whenever 'max(size*prob) > 1', and are guaranteed if 'max(size*prob) >= 2'. You can think of drawing 'trunc(size*prob)' observations deterministically, then drawing the remaining 'size - sum(trunc(size*prob))' observations without replacement, with an adjusted prob vector. The algorithm I used is relatively simple. It is one of the Brewer and Hanif algorithms (though I don't recall if they used the final random shuffle). Here's one description, or you may prefer the description in Pedro J. Saavedra (2005) Comparison of Two Weighting Schemes for Sampling with Minimal Replacement http://www.amstat.org/Sections/Srms/Proceedings/y2005/Files/JSM2005-000882.pdf In the case of minimal = TRUE (sampling with minimal replacement), with unequal probabilities: * scale prob to sum to 1, * randomly sort the observations along with prob * let cprob = cumsum(prob), * draw a systematic sample of size 'size' in (0,1): uniformVector <- (1:size - runif(1))/size * observation i is selected if cprob[i-1] < uniformVector[j] <= cprob[i] for any j In the case (size*max(prob) > 1), the number of times the observation is selected is the number of j's for which the inequalities hold. * the selected observations are randomly sorted again. Tim Hesterberg Disclaimer - these are my opinions, not Insightful's.