Hi, I looked up the help file on sample(), but didn't find the info I was looking for. When sample() is used to resample from a distribution, e.g., bootstrap, how does it do it? Does it use an uniform distribution, e.g., runif(), or something else? And, when the help file says:"sample(x) generates a random permutation of the elements of x (or 1:x)", would I be correct if I translate the statement as follows: it means that the order of sequence, which was generated from a uniform distribution, would look like a random normal distribution. Thanks, Tom [[alternative HTML version deleted]]
When sampling with replacement (like ordinary bootstrap), each draw is done independently, and in each draw every point has equal probability of being drawn. When sampling without replacement (random permutation), all possible sequences (permutations) have equal probability of occurring. E.g., if the data is 1:2, then (1, 2) has the same probability of occurring as (2, 1). Andy From: tom soyer> > Hi, > > I looked up the help file on sample(), but didn't find the > info I was looking for. > > When sample() is used to resample from a distribution, e.g., > bootstrap, how does it do it? Does it use an uniform > distribution, e.g., runif(), or something else? And, when the > help file says:"sample(x) generates a random permutation of > the elements of x (or 1:x)", would I be correct if I > translate the statement as follows: it means that the order > of sequence, which was generated from a uniform distribution, > would look like a random normal distribution. > > Thanks, > > Tom > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
On Thu, 2006-10-19 at 12:07 -0500, tom soyer wrote:> Hi, > > I looked up the help file on sample(), but didn't find the info I was > looking for. > > When sample() is used to resample from a distribution, e.g., bootstrap, how > does it do it? Does it use an uniform distribution, e.g., runif(), or > something else? And, when the help file says:"sample(x) generates a random > permutation of the elements of x (or 1:x)", would I be correct if I > translate the statement as follows: it means that the order of > sequence, which was generated from a uniform distribution, would look like a > random normal distribution. > > Thanks, > > TomIn the simplest case, where you have not specified a set of probability weights, sample() uses a uniform distribution, such that each element has an equal probability of being selected. In the case of sampling WITHOUT replacement (the default), each element in the vector has an equal probability of being selected. Once selected, that element is removed from the sampling space and the process is repeated with the remaining elements until all elements have been selected. So:> sample(10)[1] 3 8 5 9 7 1 4 2 10 6 yields a random permutation of 1:10. In the case of 'replace = TRUE', which is sampling WITH replacement, after an element is selected it is retained in the sampling space, thus can be selected multiple times. So:> sample(10, replace = TRUE)[1] 1 4 1 8 7 8 6 7 5 9 If you specify a set of probability weights from the sampling vector, then the probability for each element in being selected is affected accordingly. In the case of bootstrapping, sampling WITH replacement is used. You might find the following post helpful in this scenario: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/67421.html If you want to investigate further, you can review the C source code for the relevant R functions in random.c in the R source tarball. The file will be in ../src/main. HTH, Marc Schwartz
On 19-Oct-06 tom soyer wrote:> Hi, > > I looked up the help file on sample(), but didn't find the > info I was looking for. > > When sample() is used to resample from a distribution, e.g., > bootstrap, how does it do it? Does it use an uniform distribution, > e.g., runif(), or something else?I don't know the details of the algorithm, but since sample() has flexible options it may be helpful to describe the effect of sample() in different cases. 1. sample(x,r) where x is a vector of length n In effect, the index values (1:n) of x are sampled from without replacement (default) with a uniform probability distribution over the available elements at all stages. Hence, i1 is sampled from (1:n) with probability 1/n for each possibility. Then i2 is sampled from the remainder with probability 1/(n-1) for each, and so on until r items (all distinct) have been sampled. If the resulting indices are {i1,i2,...,ir} then the result is x[i1],x[i2],...,x[ir]. Thus, if some of the values in x[1],...,x[n] are equal, you can get 2 or more items in the sample which are equal even though the sampling is done without replacement (since it is the indices which are sampled). [NB I'm describing the *effect* here, not saying that this is how the algorithm operates] 2. sample(x, replace=TRUE) Similar to [1], except that the sampled index is returned to the pool and is available to be sampled again, so at each stage the probability of any value being chosen is 1/n. 3. sample(x, replace=TRUE, prob=p) where p is a vector of probability weights (which must not all be 0, and none negative). First, p is converted into a probability distribution (summing to 1) (in effect by dividing by the sum). Then an index i1 is sampled from (1:n) with probability p[i] that i is chosen. This is repeated (with previously sampled i's still available) until r index values have been sampled -- i1,...,ir. The result is x[i1],...,x[ir]. 4. sample(x, prob=p) [without replacement] First p is scaled to sum to 1, then i1 is sampled as in [3]. The remaining p-values are rescaled so as to sum to 1, and i2 is sampled from the remaining i's; and so on. These are the essential variants of the use of sample(). runif() can be used to sample i1 from (1:n) with equal probabilities by selecting i if runif() is <= i and > (i-1) for i = 1:n. Similarly runif() can be used to sample i1 from (1:n) with probabilities p1,...,pn by selecting i if p[1] + ... + p[i-1] < runif() <= p[1] + ... + p[i] [LHS=0 if i=0], since the probability of this happening is p[i].> And, when the help file > says:"sample(x) generates a random permutation of the elements > of x (or 1:x)",Since the default value of r (size of sample) is the length of x, say n, sample(x) (see [1] above) will sample n elements without replacement from the n elements of x with uniform probabilities at each stage. In effect, n elements i1,i2,...,in will be sampled without replacement from (1:n), giving a random permutation of (1:n), so the result x[i1],...,x[in] will be a random permutation of x[1],...,x[n] (though different random permutations may look identical if there are equal values in x[1],...,x[n]).> would I be correct if I translate the statement > as follows: it means that the order of sequence, which was > generated from a uniform distribution, would look like a > random normal distribution.No. A normal distribution has nothing to do with it! *Unless* the values x[1],...,x[n] already loooked like values which had already been sampled from a normal distribution (but were, say, in increasing order of size). Then sample(x) would shuffle them into random order so the result could then look like a real sample according ot eh order in which the data came in. Hoping this helps! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 19-Oct-06 Time: 19:34:13 ------------------------------ XFMail ------------------------------
Tom Soyer wrote:> > I looked up the help file on sample(), but didn't find the info I was > looking for. > > When sample() is used to resample from a distribution, e.g., > bootstrap, how does it do it? Does it use an uniform distribution, > e.g., runif(), or something else? And, when the help file > says:"sample(x) generates a random permutation of the elements of x > (or 1:x)", would I be correct if I translate the statement as > follows: it means that the order of sequence, which was generated > from a uniform distribution, would look like a random normal distribution. >I think it's clear that sample (without repetition) simulates what you would get if you wrote every element in a card, shuffled the card, and extracted a sample. In other words, take some number n, another m <= n, let x <- 1:n and then simulate y <- sample(x, m). If you do it many times, y[1] (or y[2], or y[m]) will have the discrete distribution given by Probability(y[1] = 1) = 1/n, Prob(y[1] = 2) = 1/n, ..., Prob(y[1] = n) = 1/n. The same, of course, is valid for y[2], etc. Ok, too much talking, let's run an example: x <- 1:10 y3.hist <- NULL for (i in 1:10000) { y <- sample(x, 5) y3.hist[i] <- y[3] } hist(y3.hist) Alberto Monteiro