thr3ads.net - R help - [R] Question about random sampling in R [Oct 2006]

If this information is useful, please help other people find it:
Share via:

tom soyer

2006-Oct-19 17:07 UTC

[R] Question about random sampling in R

Hi,

I looked up the help file on sample(), but didn't find the info I was
looking for.

When sample() is used to resample from a distribution, e.g., bootstrap, how
does it do it? Does it use an uniform distribution, e.g., runif(), or
something else? And, when the help file says:"sample(x) generates a random
permutation of the elements of x (or 1:x)", would I be correct if I
translate the statement as follows: it means that the order of
sequence, which was generated from a uniform distribution, would look like a
random normal distribution.

Thanks,

Tom

	[[alternative HTML version deleted]]

Liaw, Andy

2006-Oct-19 18:05 UTC

head link

[R] Question about random sampling in R

When sampling with replacement (like ordinary bootstrap), each draw is
done independently, and in each draw every point has equal probability
of being drawn.  When sampling without replacement (random permutation),
all possible sequences (permutations) have equal probability of
occurring.  E.g., if the data is 1:2, then (1, 2) has the same
probability of occurring as (2, 1).

Andy

From: tom soyer> 
> Hi,
> 
> I looked up the help file on sample(), but didn't find the 
> info I was looking for.
> 
> When sample() is used to resample from a distribution, e.g., 
> bootstrap, how does it do it? Does it use an uniform 
> distribution, e.g., runif(), or something else? And, when the 
> help file says:"sample(x) generates a random permutation of 
> the elements of x (or 1:x)", would I be correct if I 
> translate the statement as follows: it means that the order 
> of sequence, which was generated from a uniform distribution, 
> would look like a random normal distribution.
> 
> Thanks,
> 
> Tom
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 

------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}

Marc Schwartz

2006-Oct-19 18:10 UTC

head link

[R] Question about random sampling in R

On Thu, 2006-10-19 at 12:07 -0500, tom soyer wrote:> Hi,
> 
> I looked up the help file on sample(), but didn't find the info I was
> looking for.
> 
> When sample() is used to resample from a distribution, e.g., bootstrap, how
> does it do it? Does it use an uniform distribution, e.g., runif(), or
> something else? And, when the help file says:"sample(x) generates a
random
> permutation of the elements of x (or 1:x)", would I be correct if I
> translate the statement as follows: it means that the order of
> sequence, which was generated from a uniform distribution, would look like
a
> random normal distribution.
> 
> Thanks,
> 
> Tom
In the simplest case, where you have not specified a set of probability
weights, sample() uses a uniform distribution, such that each element
has an equal probability of being selected.

In the case of sampling WITHOUT replacement (the default), each element
in the vector has an equal probability of being selected. Once selected,
that element is removed from the sampling space and the process is
repeated with the remaining elements until all elements have been
selected.

So:
> sample(10) [1]  3  8  5  9  7  1  4  2 10  6

yields a random permutation of 1:10.

In the case of 'replace = TRUE', which is sampling WITH replacement,
after an element is selected it is retained in the sampling space, thus
can be selected multiple times.

So:
> sample(10, replace = TRUE) [1] 1 4 1 8 7 8 6 7 5 9

If you specify a set of probability weights from the sampling vector,
then the probability for each element in being selected is affected
accordingly.

In the case of bootstrapping, sampling WITH replacement is used. You
might find the following post helpful in this scenario:

  http://finzi.psych.upenn.edu/R/Rhelp02a/archive/67421.html

If you want to investigate further, you can review the C source code for
the relevant R functions in random.c in the R source tarball. The file
will be in ../src/main.

HTH,

Marc Schwartz

(Ted Harding)

2006-Oct-19 18:34 UTC

head link

[R] Question about random sampling in R

On 19-Oct-06 tom soyer wrote:> Hi,
> 
> I looked up the help file on sample(), but didn't find the
> info I was looking for.
> 
> When sample() is used to resample from a distribution, e.g.,
> bootstrap, how does it do it? Does it use an uniform distribution,
> e.g., runif(), or something else?
I don't know the details of the algorithm, but since sample()
has flexible options it may be helpful to describe the effect
of sample() in different cases.

1. sample(x,r) where x is a vector of length n
In effect, the index values (1:n) of x are sampled from
without replacement (default) with a uniform probability
distribution over the available elements at all stages.
Hence, i1 is sampled from (1:n) with probability 1/n for
each possibility. Then i2 is sampled from the remainder
with probability 1/(n-1) for each, and so on until r items
(all distinct) have been sampled. If the resulting indices
are {i1,i2,...,ir} then the result is x[i1],x[i2],...,x[ir].
Thus, if some of the values in x[1],...,x[n] are equal,
you can get 2 or more items in the sample which are equal
even though the sampling is done without replacement (since
it is the indices which are sampled).
[NB I'm describing the *effect* here, not saying that this
is how the algorithm operates]

2. sample(x, replace=TRUE)
Similar to [1], except that the sampled index is returned
to the pool and is available to be sampled again, so at each
stage the probability of any value being chosen is 1/n.

3. sample(x, replace=TRUE, prob=p) where p is a vector of
   probability weights (which must not all be 0, and none
   negative).
First, p is converted into a probability distribution
(summing to 1) (in effect by dividing by the sum).
Then an index i1 is sampled from (1:n) with probability
p[i] that i is chosen. This is repeated (with previously
sampled i's still available) until r index values have been
sampled -- i1,...,ir. The result is x[i1],...,x[ir].

4. sample(x, prob=p) [without replacement]
First p is scaled to sum to 1, then i1 is sampled as in [3].
The remaining p-values are rescaled so as to sum to 1,
and i2 is sampled from the remaining i's; and so on.

These are the essential variants of the use of sample().

runif() can be used to sample i1 from (1:n) with equal
probabilities by selecting i if runif() is <= i and > (i-1)
for i = 1:n.

Similarly runif() can be used to sample i1 from (1:n)
with probabilities p1,...,pn by selecting i if

  p[1] + ... + p[i-1] < runif() <= p[1] + ... + p[i]

[LHS=0 if i=0], since the probability of this happening is p[i].
> And, when the help file
> says:"sample(x) generates a random permutation of the elements
> of x (or 1:x)",
Since the default value of r (size of sample) is the length
of x, say n, sample(x) (see [1] above) will sample n elements
without replacement from the n elements of x with uniform
probabilities at each stage. In effect, n elements i1,i2,...,in
will be sampled without replacement from (1:n), giving a
random permutation of (1:n), so the result x[i1],...,x[in]
will be a random permutation of x[1],...,x[n] (though
different random permutations may look identical if there
are equal values in x[1],...,x[n]).
> would I be correct if I translate the statement
> as follows: it means that the order of sequence, which was
> generated from a uniform distribution, would look like a
> random normal distribution.
No. A normal distribution has nothing to do with it!

*Unless* the values x[1],...,x[n] already loooked like values
which had already been sampled from a normal distribution (but
were, say, in increasing order of size). Then sample(x) would
shuffle them into random order so the result could then look
like a real sample according ot eh order in which the data
came in.

Hoping this helps!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 19-Oct-06                                       Time: 19:34:13
------------------------------ XFMail ------------------------------

Alberto Monteiro

2006-Oct-19 19:53 UTC

head link

[R] Question about random sampling in R

Tom Soyer wrote:> 
> I looked up the help file on sample(), but didn't find the info I was
> looking for.
> 
> When sample() is used to resample from a distribution, e.g., 
> bootstrap, how does it do it? Does it use an uniform distribution, 
> e.g., runif(), or something else? And, when the help file 
> says:"sample(x) generates a random permutation of the elements of x 
> (or 1:x)", would I be correct if I translate the statement as 
> follows: it means that the order of sequence, which was generated 
> from a uniform distribution, would look like a random normal distribution.
>I think it's clear that sample (without repetition) simulates
what you would get if you wrote every element in a card, shuffled
the card, and extracted a sample.

In other words, take some number n, another m <= n, 
let x <- 1:n and then simulate y <- sample(x, m). If you
do it many times, y[1] (or y[2], or y[m]) will have the
discrete distribution given by Probability(y[1] = 1) = 1/n,
Prob(y[1] = 2) = 1/n, ..., Prob(y[1] = n) = 1/n. The same,
of course, is valid for y[2], etc.

Ok, too much talking, let's run an example:

  x <- 1:10
  y3.hist <- NULL
  for (i in 1:10000) {
    y <- sample(x, 5)
    y3.hist[i] <- y[3]
  }
  hist(y3.hist)

Alberto Monteiro

Possibly Parallel Threads

Search for more maybe matching threads

R help - Oct 2006 - Question about random sampling in R

[R] Question about random sampling in R

[R] Question about random sampling in R

[R] Question about random sampling in R

[R] Question about random sampling in R

[R] Question about random sampling in R

Possibly Parallel Threads