Small data sets (6-12 values, or a similarly small number of groups) which don't look nice and symmetric are quite common in my field (analytical chemistry and biological variants thereof), and often contain outliers or at least stragglers that I cannot simply discard. One of the things I occasionally do when I want to see what different assumptions do to my confidence intervals is to run a quick nonparametric bootstrap, just to get a feel for how asymmetric the distribution of any estimates might be. At the moment, I'm also interested in doing that on some historical data to evaluate some proposed estimators for interlab studies. boot() is pretty good, but it's obvious that with such small sets, there aren't really many distinct resampled combinations (eg 92378 for 10 data points). So I'm really resampling from quite a small population of possible bootstrap samples. Its surely more efficient to generate all the different (resampled) combinations of the data set, and use those and their frequencies to get things like the bootstrap variance exactly. At worst, that'll stop us fooling ourselves into thinking more replicates will get better info. A lengthy dig around R-help and CRAN turned up a blank on generating distinct combinations with resampling, so I've written a couple of routines to generate the distinct combinations and their frequencies. (They work, though I wouldn't guarantee great efficiency). But if a chemist (me) can think of it, its pretty certain that a statistician already has. Before I spend hours polishing code, is there already something out there I've missed? Steve Ellison ******************************************************************* This email and any attachments are confidential. Any use, co...{{dropped}}
Marc Schwartz
2007-Mar-06 16:59 UTC
[R] Distinct combinations for bootstrapping small sets
On Tue, 2007-03-06 at 15:54 +0000, S Ellison wrote:> Small data sets (6-12 values, or a similarly small number of groups) > which don't look nice and symmetric are quite common in my field > (analytical chemistry and biological variants thereof), and often > contain outliers or at least stragglers that I cannot simply discard. > One of the things I occasionally do when I want to see what different > assumptions do to my confidence intervals is to run a quick > nonparametric bootstrap, just to get a feel for how asymmetric the > distribution of any estimates might be. At the moment, I'm also > interested in doing that on some historical data to evaluate some > proposed estimators for interlab studies. > > boot() is pretty good, but it's obvious that with such small sets, > there aren't really many distinct resampled combinations (eg 92378 for > 10 data points). So I'm really resampling from quite a small > population of possible bootstrap samples. Its surely more efficient to > generate all the different (resampled) combinations of the data set, > and use those and their frequencies to get things like the bootstrap > variance exactly. At worst, that'll stop us fooling ourselves into > thinking more replicates will get better info. > > A lengthy dig around R-help and CRAN turned up a blank on generating > distinct combinations with resampling, so I've written a couple of > routines to generate the distinct combinations and their frequencies. > (They work, though I wouldn't guarantee great efficiency). But if a > chemist (me) can think of it, its pretty certain that a statistician > already has. Before I spend hours polishing code, is there already > something out there I've missed? > > Steve EllisonSteve, The phrase that you seem to be looking for is "permutation test". If you use the following in R: RSiteSearch("{permutation test}", restrict = "functions") that will lead you to some of the functions available. One CRAN package specifically, 'coin', has a permutation framework for a variety of such tests. HTH, Marc Schwartz