thr3ads.net - R devel - [Rd] Using sample() to sample one value from a single value? [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Henrik Bengtsson

2010-Nov-03 17:54 UTC

[Rd] Using sample() to sample one value from a single value?

Hi, consider this one as an FYI, or a seed for further discussion.

I am aware that many traps on sample() have been reported over the
years.  I know that these are also documents in help("sample").  Still
I got bitten by this while writing

sample(units, size=length(units));

where 'units' is an index (positive integer) vector.  It works in all
cases as expected (=I expect) expect for length(units) == 1.  I know,
it is well known.  However, it got to make me wonder if it is possible
to use sample() to draw a single value from a set containing only one
value.  I don't think so, unless you draw from a value that is <= 1.

For instance, you can sample from c(10,10) by doing:
> sample(rep(10, times=2), size=2);[1] 10 10

but you cannot sample from c(10) by doing:
> sample(rep(10, times=1), size=1);[1] 9

unless you sample from a value <= 1, e.g.

sample(rep(0.31, times=1), size=1);
[1] 0.31

sample(rep(-10, times=1), size=1);
[1] -10

Note also the related issue of sampling from a double vector of length 1, e.g.
> sample(rep(1.2, times=2), size=2);
[1] 1.2 1.2> sample(rep(1.2, times=1), size=1);[1] 1

I the latter case 1.2 is coerced to an integer.

All of the above makes sense when one study the code of sample(), but
sample() is indeed dangerous, e.g. imagine how many bootstrap
estimates out there quietly gets incorrect.


In order to cover all cases of length(units), including one, a solution is:

sampleFrom <- function(x, size=length(x), ...) {
  n <- length(x);
  if (n == 1L) {
    res <- x;
  } else {
    res <- sample(x, size=size, ...);
  }
  res;
} # sampleFrom()
> sampleFrom(rep(10, times=2), size=2);[1] 10 10
> sampleFrom(rep(10, times=1), size=1);[1] 10
> sampleFrom(rep(0.31, times=1), size=1);[1] 0.31
> sampleFrom(rep(-10, times=1), size=1);[1] -10
> sampleFrom(rep(1.2, times=2), size=2);[1] 1.2 1.2
> sampleFrom(rep(1.2, times=1), size=1);[1] 1.2


I want to add sampleFrom() to the wishlist of functions to be
available in default R.  Alternatively, one can add an argument
'sampleFrom=FALSE' to the existing sample() function.  Eventually such
an argument can be made TRUE by default.

/Henrik

Henrique Dallazuanna

2010-Nov-03 18:02 UTC

head link

[Rd] Using sample() to sample one value from a single value?

The resample function in the example section from sample help page does it
or not?

On Wed, Nov 3, 2010 at 3:54 PM, Henrik Bengtsson
<hb@biostat.ucsf.edu>wrote:
> Hi, consider this one as an FYI, or a seed for further discussion.
>
> I am aware that many traps on sample() have been reported over the
> years.  I know that these are also documents in help("sample"). 
Still
> I got bitten by this while writing
>
> sample(units, size=length(units));
>
> where 'units' is an index (positive integer) vector.  It works in
all
> cases as expected (=I expect) expect for length(units) == 1.  I know,
> it is well known.  However, it got to make me wonder if it is possible
> to use sample() to draw a single value from a set containing only one
> value.  I don't think so, unless you draw from a value that is <= 1.
>
> For instance, you can sample from c(10,10) by doing:
>
> > sample(rep(10, times=2), size=2);
> [1] 10 10
>
> but you cannot sample from c(10) by doing:
>
> > sample(rep(10, times=1), size=1);
> [1] 9
>
> unless you sample from a value <= 1, e.g.
>
> sample(rep(0.31, times=1), size=1);
> [1] 0.31
>
> sample(rep(-10, times=1), size=1);
> [1] -10
>
> Note also the related issue of sampling from a double vector of length 1,
> e.g.
>
> > sample(rep(1.2, times=2), size=2);
> [1] 1.2 1.2
> > sample(rep(1.2, times=1), size=1);
> [1] 1
>
> I the latter case 1.2 is coerced to an integer.
>
> All of the above makes sense when one study the code of sample(), but
> sample() is indeed dangerous, e.g. imagine how many bootstrap
> estimates out there quietly gets incorrect.
>
>
> In order to cover all cases of length(units), including one, a solution is:
>
> sampleFrom <- function(x, size=length(x), ...) {
>  n <- length(x);
>  if (n == 1L) {
>    res <- x;
>  } else {
>    res <- sample(x, size=size, ...);
>  }
>  res;
> } # sampleFrom()
>
> > sampleFrom(rep(10, times=2), size=2);
> [1] 10 10
>
> > sampleFrom(rep(10, times=1), size=1);
> [1] 10
>
> > sampleFrom(rep(0.31, times=1), size=1);
> [1] 0.31
>
> > sampleFrom(rep(-10, times=1), size=1);
> [1] -10
>
> > sampleFrom(rep(1.2, times=2), size=2);
> [1] 1.2 1.2
>
> > sampleFrom(rep(1.2, times=1), size=1);
> [1] 1.2
>
>
> I want to add sampleFrom() to the wishlist of functions to be
> available in default R.  Alternatively, one can add an argument
> 'sampleFrom=FALSE' to the existing sample() function.  Eventually
such
> an argument can be made TRUE by default.
>
> /Henrik
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Tim Hesterberg

2010-Nov-04 14:42 UTC

head link

[Rd] Using sample() to sample one value from a single value?

On Wed, Nov 3, 2010 at 3:54 PM, Henrik Bengtsson <hb at
biostat.ucsf.edu>wrote:
> Hi, consider this one as an FYI, or a seed for further discussion.
>
> I am aware that many traps on sample() have been reported over the
> years.  I know that these are also documents in help("sample"). 
Still
> I got bitten by this while writing
>...
> All of the above makes sense when one study the code of sample(), but
> sample() is indeed dangerous, e.g. imagine how many bootstrap
> estimates out there quietly gets incorrect.
Nonparametric bootstrapping from a sample of size 1 is <always> incorrect.
If you draw a single observation from a sample of size 1, you get that
same observation back.  This implies zero sampling variability, which
is wrong.  If this single sample represents one stratum or sample in
a larger problem, this would contribute zero variability to the overall
result, again wrong.

In general, the ordinary bootstrap underestimates variability in
small samples.  For a sample mean, the ordinary bootstrap corresponds
to using an estimate of variance equal to (1/n) sum((x - mean(x))^2),
instead of a divisor of n-1.  In stratified and multi-sample applications
the downward bias is similarly (n-1)/n.

Three remedies are:
* draw bootstrap samples of size n-1
* "bootknife" sampling - omit one observation (a jackknife sample),
then
  draw a bootstrap sample of size n from that
* bootstrap from a kernel density estimate, with kernel covariance equal
  to empirical covariance (with divisor n-1) / n.
The latter two are described in 
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs.
Smoothing, Proceedings of the Section on Statistics and the Environment,
American Statistical Association, 2924-2930.
http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

All three are undefined for samples of size 1.  You need to go to some
other bootstrap, e.g. a parametric bootstrap with variability estimated
from other data.

Tim Hesterberg

Henrik Bengtsson

2010-Nov-04 17:59 UTC

head link

[Rd] Using sample() to sample one value from a single value?

Hi.

On Thu, Nov 4, 2010 at 7:42 AM, Tim Hesterberg <timhesterberg at
gmail.com> wrote:> On Wed, Nov 3, 2010 at 3:54 PM, Henrik Bengtsson <hb at
biostat.ucsf.edu>wrote:
>
>> Hi, consider this one as an FYI, or a seed for further discussion.
>>
>> I am aware that many traps on sample() have been reported over the
>> years. ?I know that these are also documents in
help("sample"). ?Still
>> I got bitten by this while writing
>>...
>> All of the above makes sense when one study the code of sample(), but
>> sample() is indeed dangerous, e.g. imagine how many bootstrap
>> estimates out there quietly gets incorrect.
>
> Nonparametric bootstrapping from a sample of size 1 is <always>
incorrect.
> If you draw a single observation from a sample of size 1, you get that
> same observation back. ?This implies zero sampling variability, which
> is wrong. ?If this single sample represents one stratum or sample in
> a larger problem, this would contribute zero variability to the overall
> result, again wrong.
>
> In general, the ordinary bootstrap underestimates variability in
> small samples. ?For a sample mean, the ordinary bootstrap corresponds
> to using an estimate of variance equal to (1/n) sum((x - mean(x))^2),
> instead of a divisor of n-1. ?In stratified and multi-sample applications
> the downward bias is similarly (n-1)/n.
>
> Three remedies are:
> * draw bootstrap samples of size n-1
> * "bootknife" sampling - omit one observation (a jackknife
sample), then
> ?draw a bootstrap sample of size n from that
> * bootstrap from a kernel density estimate, with kernel covariance equal
> ?to empirical covariance (with divisor n-1) / n.
> The latter two are described in
> Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs.
> Smoothing, Proceedings of the Section on Statistics and the Environment,
> American Statistical Association, 2924-2930.
> http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf
>
> All three are undefined for samples of size 1. ?You need to go to some
> other bootstrap, e.g. a parametric bootstrap with variability estimated
> from other data.
I had a feeling that I was going to be bitten by that attention
grabber on bootstrapping. Worse it may be misleading to some.  But
honestly, thank you Tim for pointing this out and so clearly
explaining it all.

/Henrik
>
> Tim Hesterberg
>
>

Reasonably Related Threads

Search for more apparently analagous threads

R devel - Nov 2010 - Using sample() to sample one value from a single value?

[Rd] Using sample() to sample one value from a single value?

[Rd] Using sample() to sample one value from a single value?

[Rd] Using sample() to sample one value from a single value?

[Rd] Using sample() to sample one value from a single value?

Reasonably Related Threads