Hi all, sample() has some well-documented undesirable behaviour. sample(1:6, 1) sample(2:6, 1) ... sample(5:6, 1) do what you expect, but sample(6:6, 1) sample(1:6, 1) do the same thing. This behaviour is documented: If 'x' has length 1, is numeric (in the sense of 'is.numeric') and 'x >= 1', sampling _via_ 'sample' takes place from '1:x'. _Note_ that this convenience feature may lead to undesired behaviour when 'x' is of varying length 'sample(x)'. See the 'resample()' example below. My proposal is to add an extra parameter is.set to sample() to control this behaviour. If the parameter is unspecified, then we keep the old behaviour for compatibility. If it is TRUE, then we treat the first parameter x as a set. If it is FALSE, then we treat it as a set size. This means that sample(6:6, 1, is.set=TRUE) would return 6 with probability 1. I have attached a patch to implement this new option. Cheers, Andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: sample.diff Type: text/x-patch Size: 3636 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20100322/cc18f9ee/attachment.bin>
Hi all, I forgot to test my patch! I fixed a few bugs. Cheers, Andrew On 22 March 2010 22:53, Andrew Clausen <clausen at econ.upenn.edu> wrote:> Hi all, > > sample() has some well-documented undesirable behaviour. > > sample(1:6, 1) > sample(2:6, 1) > ... > sample(5:6, 1) > > do what you expect, but > > sample(6:6, 1) > sample(1:6, 1) > > do the same thing. > > This behaviour is documented: > > ? ? If 'x' has length 1, is numeric (in the sense of 'is.numeric') and > ? ? 'x >= 1', sampling _via_ 'sample' takes place from '1:x'. ?_Note_ > ? ? that this convenience feature may lead to undesired behaviour when > ? ? 'x' is of varying length 'sample(x)'. ?See the 'resample()' > ? ? example below. > > My proposal is to add an extra parameter is.set to sample() to control > this behaviour. ?If the parameter is unspecified, then we keep the old > behaviour for compatibility. ?If it is TRUE, then we treat the first > parameter x as a set. ?If it is FALSE, then we treat it as a set size. > ?This means that > > sample(6:6, 1, is.set=TRUE) > > would return 6 with probability 1. > > I have attached a patch to implement this new option. > > Cheers, > Andrew >
Possibly Parallel Threads
- using "sample()" for a vector of length 1
- [Fwd: Re: [R] Randomly remove condition-selected rows from a matrix]
- Opus Tools -- low bitrates, new features in 1.5, "expect-loss"
- Randomly remove condition-selected rows from a matrix
- Opus Tools -- low bitrates, new features in 1.5, "expect-loss"