Hi. I have a problem with the default behavior of sample(), which performs sample(1:x) when x is a single value. This behavior is well explained in ?sample. However, this behavior is annoying when the number of value is not predictable. Would it be possible to add an argument that desactivates this and perform the sampling on a single value ? Examples:> sample(10, size = 1, replace = FALSE)10> sample(10, size = 3, replace = TRUE)10 10 10> sample(10, size = 3, replace = FALSE)Error Many thanks for your help. Best wishes, Gael Millot. Gael Millot UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie Equipe Recombinaison et instabilite genetique Pav Trouillet Rossignol 5eme etage Institut Curie 26 rue d'Ulm 75248 Paris Cedex 05 FRANCE tel : 33 1 56 24 66 34 fax : 33 1 56 24 66 44 Email : gael.millot at curie.fr http://perso.curie.fr/Gael.Millot/index.html [[alternative HTML version deleted]]
You're not the first one, e.g. https://stat.ethz.ch/pipermail/r-devel/2010-March/057029.html https://stat.ethz.ch/pipermail/r-devel/2010-November/058981.html (I was bitten by this in a resampling scheme where the set sampled from was data driven). Here's a simple solution - taken from R.utils::resample();> resample <- function (x, ...) x[sample.int(length(x), ...)]> resample(10, size = 1, replace = FALSE)[1] 10> resample(10, size = 3, replace = TRUE)[1] 10 10 10> resample(10, size = 3, replace = FALSE)Error in sample.int(length(x), ...) : cannot take a sample larger than the population when 'replace = FALSE' /Henrik On Mon, Jun 15, 2015 at 5:55 AM, Millot Gael <Gael.Millot at curie.fr> wrote:> Hi. > > I have a problem with the default behavior of sample(), which performs sample(1:x) when x is a single value. > This behavior is well explained in ?sample. > However, this behavior is annoying when the number of value is not predictable. Would it be possible to add an argument > that desactivates this and perform the sampling on a single value ? Examples: >> sample(10, size = 1, replace = FALSE) > 10 > >> sample(10, size = 3, replace = TRUE) > 10 10 10 > >> sample(10, size = 3, replace = FALSE) > Error > > Many thanks for your help. > > Best wishes, > > Gael Millot. > > > Gael Millot > UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie > Equipe Recombinaison et instabilite genetique > Pav Trouillet Rossignol 5eme etage > Institut Curie > 26 rue d'Ulm > 75248 Paris Cedex 05 > FRANCE > tel : 33 1 56 24 66 34 > fax : 33 1 56 24 66 44 > Email : gael.millot at curie.fr > http://perso.curie.fr/Gael.Millot/index.html > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Am .06.2015, 14:55 Uhr, schrieb Millot Gael <Gael.Millot at curie.fr>:> Hi. > > I have a problem with the default behavior of sample(), which performs > sample(1:x) when x is a single value. > This behavior is well explained in ?sample. > However, this behavior is annoying when the number of value is not > predictable. Would it be possible to add an argument > that desactivates this and perform the sampling on a single value ? > Examples: >> sample(10, size = 1, replace = FALSE) > 10 > >> sample(10, size = 3, replace = TRUE) > 10 10 10 > >> sample(10, size = 3, replace = FALSE) > ErrorI think the problem here is that the function actually does what you would expect it to do given a statistic perspective. A sample of size three from a population of one without allowing to draw elements again that were drawn already is simply not defined. What shall the function give back? ... You can always wrap your code in a try() like this to prevent errors to break loops or functions: try(sample(...)) ... or you might check your arguments before execution: if ( !replace & length(population) >= size ){ sample(population, size = size , replace = replace) }else{ ... }> > Many thanks for your help. > > Best wishes, > > Gael Millot. > > > Gael Millot > UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie > Equipe Recombinaison et instabilite genetique > Pav Trouillet Rossignol 5eme etage > Institut Curie > 26 rue d'Ulm > 75248 Paris Cedex 05 > FRANCE > tel : 33 1 56 24 66 34 > fax : 33 1 56 24 66 44 > Email : gael.millot at curie.fr > http://perso.curie.fr/Gael.Millot/index.html > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-develBest, Peter --
On 6/16/2015 1:32 PM, Peter Meissner wrote:> Am .06.2015, 14:55 Uhr, schrieb Millot Gael <Gael.Millot at curie.fr>: > >> Hi. >> >> I have a problem with the default behavior of sample(), which performs >> sample(1:x) when x is a single value. >> This behavior is well explained in ?sample. >> However, this behavior is annoying when the number of value is not >> predictable. Would it be possible to add an argument >> that desactivates this and perform the sampling on a single value ? >> Examples: >>> sample(10, size = 1, replace = FALSE) >> 10 >> >>> sample(10, size = 3, replace = TRUE) >> 10 10 10 >> >>> sample(10, size = 3, replace = FALSE) >> Error > > I think the problem here is that the function actually does what you > would expect it to do given a statistic perspective. A sample of size > three from a population of one without allowing to draw elements again > that were drawn already is simply not defined. What shall the function > give back?If I understand right, this error is exactly what the poster would like to see, but which you dont get currently. If length(population) == 1, you will now sample from 1:population, not the population itself. So: > sample(8:10, 3, replace = FALSE) [1] 10 8 9 > sample(9:10, 3, replace = FALSE) Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE' > sample(10:10, 3, replace = FALSE) [1] 8 10 2 I have to admit that I also find this behaviour inconsistent, even if it is well described already on the first line of the details in the documentation. It is definitely a feature which can cause some trouble, and where the tests might end up more complicated than you would first think.> > ... You can always wrap your code in a try() like this to prevent errors > to break loops or functions: > > try(sample(...))No error is given when length(population) == 1, and the result might be perfectly valid if population is variable. So this will easily stay in the script as an undetected bug.> > ... or you might check your arguments before execution: > > > if ( !replace & length(population) >= size ){ > sample(population, size = size , replace = replace) > }else{ > ... > }This test is not sufficient if length(population) == size == 1, so you will also need to check for this special case: if (length(population) == 1 & size == 1) { population } else if (!replace & length(population) >= size) { sample(population, size = size, replace = replace) } else { ... } Then the question would be if this test could be replaced with a new argument to sample, e.g. expandSingle, which has TRUE as default for backward compatibility, but FALSE if you dont want population to be expanded to 1:population. It could certainly be useful in some cases, but you still need to know about the expansion to use it. I think most of these bugs occur because users did not think about the expansion in the first place or did not realize that their population could be of length 1 in some situations. These users would therefore not think about changing the argument either. Cheers, Jon> > >> >> Many thanks for your help. >> >> Best wishes, >> >> Gael Millot. >> >> >> Gael Millot >> UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie >> Equipe Recombinaison et instabilite genetique >> Pav Trouillet Rossignol 5eme etage >> Institut Curie >> 26 rue d'Ulm >> 75248 Paris Cedex 05 >> FRANCE >> tel : 33 1 56 24 66 34 >> fax : 33 1 56 24 66 44 >> Email : gael.millot at curie.fr >> http://perso.curie.fr/Gael.Millot/index.html >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > Best, Peter > > -- > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Jon Olav Sk?ien Joint Research Centre - European Commission Institute for Environment and Sustainability (IES) Climate Risk Management Unit Via Fermi 2749, TP 100-01, I-21027 Ispra (VA), ITALY jon.skoien at jrc.ec.europa.eu Tel: +39 0332 789205 Disclaimer: Views expressed in this email are those of the individual and do not necessarily represent official views of the European Commission.