Essentially what the sample function is doing (though it does it in a
much more efficient way I expect) is the equivalent of this code:
i <- c(1:10)
myProbs <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9)
myProbs <- myProbs/sum(myProbs)
cp <- c(0,cumsum(myProbs))
i[findInterval( runif(5), cp )]
Internally the prob vector is scaled to sum to 1 (so there is no
difference in your last 2 examples), then a cumulative sum is created,
then random uniforms are generated and compared to the cumulative sum
of the prob's. This gives the desired probabilities for each value.
On Fri, Mar 7, 2014 at 3:24 AM, Thomas <thomas.chesney at
nottingham.ac.uk> wrote:> I'm trying to figure out exactly what the prob parameter in the sample
> function does.
>
> With the following code, does sample look randomly for the first possible
> sample--let's say it choses the second element--and then assess whether
it
> can be chosen according to it's probability which is 0.8? It seems
unlikely
> it would work like this.
>
> Or does it create a `biased die' which in this case would have ten
sides
> that each come up according to the probabilities in myProb, and roll it to
> see which is the first element chosen, then remove that element, create a
> new biased die with 9 sides and roll it again?
>
> i <- c(1:10)
> myProbs <- c(0.2, 0.8, 0.3, 0.2, 0.1, 0.1, 0.1, 0.2, 0.3, 0.4)
> f <- sample(i,5, replace=FALSE, prob=myProbs)
>
> Then what's the difference in terms of sampling between the following
two
> examples, the second of which has been created so that the probabilities
add
> to 1?
>
> i <- c(1:10)
> myProbs <- c(0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.9)
> f <- sample(i,5, replace=FALSE, prob=myProbs)
>
> i <- c(1:10)
> myProbs <- c(0.1/5, 0.1/5, 0.1/5, 0.1/5, 0.1/5, 0.9/5, 0.9/5, 0.9/5,
0.9/5,
> 0.9/5)
> f <- sample(i,5, replace=FALSE, prob=myProbs)
>
> Thank you,
>
> Thomas Chesney
> This message and any attachment are intended solely for the addressee and
> may contain confidential information. If you have received this message in
> error, please send it back to me, and immediately delete it. Please do
not
> use, copy or disclose the information contained in this message or in any
> attachment. Any views or opinions expressed by the author of this email do
> not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system,
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com