chajewski at fordham.edu
2009-May-28 07:30 UTC
[Rd] Bug in base function sample ( ) (PR#13727)
Full_Name: Michael Chajewski Version: 2.9.0 OS: Windows XP Submission from: (NULL) (150.108.71.185) I was programming a routine which kept reducing the array from which a random sample was taken, resulting in a single number. I discovered that when R attempts to sample from an object with only one number it does not reproduce/report the number but instead chooses a random number between 1 and that number. Example 1: # I am assigning a single number gg <- 7 # Creating an array to store sampled values ggtrack <- 0 # I am sampling 10,000 observations from my single value # object and storing them for (i in 1:10000) { g0 <- sample(gg, (i/i)) ggtrack <- c(ggtrack,g0) } # Deleting the initial value in the array ggtrack <- ggtrack[-1] # The array ought to be 10,000 samples long (and it is) length(ggtrack) # The array should contain 10,000 "7", but it does not # See the histogram of sampled values hist(ggtrack) Example 2: # Here is the same example, but now with # two number. Note that now the function performs # as expected and only samples between the two. gg <- c(7,2) ggtrack <- 0 for (i in 1:10000) { g0 <- sample(gg, (i/i)) ggtrack <- c(ggtrack,g0) } ggtrack <- ggtrack[-1] length(ggtrack) hist(ggtrack) Highest Regards, Michael Chajewski
On Thu, 2009-05-28 at 09:30 +0200, chajewski at fordham.edu wrote:> Full_Name: Michael Chajewski > Version: 2.9.0 > OS: Windows XP > Submission from: (NULL) (150.108.71.185) > > > I was programming a routine which kept reducing the array from which a random > sample was taken, resulting in a single number. I discovered that when R > attempts to sample from an object with only one number it does not > reproduce/report the number but instead chooses a random number between 1 and > that number.This is working as documented/intended in ?sample. 'x' is of length 1, so it is interpreted as 1:x (if x >=1), resulting in the behaviour you have encountered. That help page even goes so far as to warn you that this "convenience feature may lead to undesired behaviour..." and gives an example function (in Examples) that handles the sort of use case you have. See the Examples section and the resample() function created there. HTH G> > Example 1: > > # I am assigning a single number > gg <- 7 > # Creating an array to store sampled values > ggtrack <- 0 > > # I am sampling 10,000 observations from my single value > # object and storing them > for (i in 1:10000) { > g0 <- sample(gg, (i/i)) > ggtrack <- c(ggtrack,g0) > } > > # Deleting the initial value in the array > ggtrack <- ggtrack[-1] > > # The array ought to be 10,000 samples long (and it is) > length(ggtrack) > > # The array should contain 10,000 "7", but it does not > # See the histogram of sampled values > hist(ggtrack) > > Example 2: > > # Here is the same example, but now with > # two number. Note that now the function performs > # as expected and only samples between the two. > > gg <- c(7,2) > ggtrack <- 0 > for (i in 1:10000) { > g0 <- sample(gg, (i/i)) > ggtrack <- c(ggtrack,g0) > } > > ggtrack <- ggtrack[-1] > length(ggtrack) > hist(ggtrack) > > > Highest Regards, > Michael Chajewski > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20090529/b5830f54/attachment.bin>
> > ...I discovered that when R attempts to sample from an object with only one > number it does not > reproduce/report the number but instead chooses a random number between 1 > and that number. >This is the documented behavior. In my opinion, it is a design error, but changing it would no doubt break lots of code. As a general rule, the designers of R seem to have preferred convenience to consistency, which often makes things easier or more concise, but sometimes causes unfortunate surprises like this. -s [[alternative HTML version deleted]]