thr3ads.net - R help - [R] boot package question: sampling on factor, not row [Nov 2003]

If this information is useful, please help other people find it:
Share via:

Scott Norton

2003-Nov-10 23:08 UTC

[R] boot package question: sampling on factor, not row

Hi all:

      I've been looking at the boot package to "bootstrap" sample
my data in a particular way.  I haven't figured out how to set this up using
the boot() command and thus have resorted to trying to write my own script
(although I'd prefer if I could get boot() to work for this problem!)

The dataset is set up in the following way:

ix(factor)  value
1		5.73
1		6.99
1		0.32
1		4.64
1		8.39
2		8.47
2		1.04
2		0.73
2		0.29
3		6.82
3		8.81
3		1.33
3		9.17
3		9.84
4		8.57
4		5.04
4		7.18
4		4.54
4		4.37
5		7.36
5		4.97
5		2.66

What I would like to do is repeatedly sample the ix (a factor), not the
individual rows.  For example, say I wanted to repeatedly sample (at a sample
size of 3) the ix value -
e.g. 1,3,5 then average the "value"s within those factors and then
lets say take the median across this each.
So for a random sample of (1,3,5) that would be:
median(c(mean(c(5.73,6.99,0.32,4.64,8.39)), mean(6.82,8.81,1.33,9.17,9.84),
mean(7.36,4.97,2.66)))
Then repeat this over combinations of 3 ix factors e.g. (1,2,3), (1,1,4), etc...

Is it possible to subsample a factor using boot() and then use that sample of
factors to access rows, rather than directly sample rows?

Thanks!!!
-Scott

Thomas W Blackwell

2003-Nov-11 03:43 UTC

head link

[R] boot package question: sampling on factor, not row

Scott  -

The second argument to  boot(),  called 'statistic', can be
any user-written function you want to cook up, with additional
arguments being passed to it through the '...' mechanism after
all of the named arguments.  (See: `R-intro `Writing your own
functions `The ellipsis argument  for details.)

To carry out your example, I would do something like the following:
(not tested ! use at your own risk.)

my.summary <- function(data, groups, ix, value)
     {	median(aggregate(value, list(ix), mean)[groups[seq(3)]])   }
library("boot")
result <- boot(seq(along=levels(ix)), my.summary, 10000, ix=ix, value=value)

You will note that what  boot()  thinks is the "data" in the
example here is only a vector of sequential integers the same
length as  levels(ix).  This data is ignored in  my.summary()
and the two columns which you show as "ix" and "value" are
used
instead.  Furthermore, unless I misunderstand your example, the
mean within each level of "ix" is invariant to which three levels
have been chosen for this particular bootstrap replicate.
Therefore, you could call  aggregate()  only once rather than
10000 times, if you rewrite the function  my.summary()  to use
the result of  aggregate()  rather than call it afresh on every
iteration.

I've given you the reference for the '...' mechanism, because
that reference is almost impossible to find using  help.search().
For the rest of the functions I've used, you're on your own to
look up their help pages.

I *will* comment that I can't see why this particular statistic
is of interest . . . but, I assume you have your own reasons.

HTH  -  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Mon, 10 Nov 2003, Scott Norton wrote:
> Hi all:
>
> I've been looking at the boot package to "bootstrap" sample
> my data in a particular way.  I haven't figured out how to
> set this up using the boot() command and thus have resorted
> to trying to write my own script (although I'd prefer if I
> could get boot() to work for this problem!)
>
> The dataset is set up in the following way:
>
> ix(factor)  value
> 1		5.73
> 1		6.99
> 1		0.32
> 1		4.64
> 1		8.39
> 2		8.47
> 2		1.04
> 2		0.73
> 2		0.29
> 3		6.82
> 3		8.81
> 3		1.33
> 3		9.17
> 3		9.84
> 4		8.57
> 4		5.04
> 4		7.18
> 4		4.54
> 4		4.37
> 5		7.36
> 5		4.97
> 5		2.66
>
> What I would like to do is repeatedly sample the ix (a factor),
> not the individual rows.  For example, say I wanted to repeatedly
> sample (at a sample size of 3) the ix value - e.g. 1,3,5 - then
> average the "value"s within those factors and then lets say take
> the median across this each.
>
> So for a random sample of (1,3,5) that would be:
>
>    median(c(mean(c(5.73,6.99,0.32,4.64,8.39)),
>             mean(6.82,8.81,1.33,9.17,9.84),
>             mean(7.36,4.97,2.66)))
>
> Then repeat this over combinations of 3 ix factors e.g. (1,2,3),
> (1,1,4), etc...
>
> Is it possible to subsample a factor using boot() and then use
> that sample of factors to access rows, rather than directly sample
> rows?
>
> Thanks!!!
> -Scott
>

Thomas W Blackwell

2003-Nov-11 13:46 UTC

head link

[R] boot package question: sampling on factor, not row

> On Mon, 10 Nov 2003, Thomas W Blackwell wrote:
>
> > The second argument to  boot(),  called 'statistic', can be
> > any user-written function you want to cook up, with additional
> > arguments being passed to it through the '...' mechanism after
> > all of the named arguments.  (See: `R-intro `Writing your own
> > functions `The ellipsis argument  for details.)
>
> > I've given you the reference for the '...' mechanism,
because
> > that reference is almost impossible to find using  help.search().
>On Tue, 11 Nov 2003, Prof Brian Ripley wrote:
> Right, as help.search `allows for searching the help system'. It does
not
> search the manuals, nor the FAQs, so it would be imposible to find things
> not in the help system.
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
Precisely my point, Brian.  The usage and meaning of '...' are
almost impossible to find in the help system.  Could there be a
help page for it ?  Questions about '...' are reasonably frequently
asked on this list.

While we're at it, what could be done so that 
help.search("logistic")
returns a reference to  glm()  and  help.search("regression") returns
references to both  lm()  and  glm() ?

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

Maybe Matching Threads

Search for more reasonably related threads

R help - Nov 2003 - boot package question: sampling on factor, not row

[R] boot package question: sampling on factor, not row

[R] boot package question: sampling on factor, not row

[R] boot package question: sampling on factor, not row

Maybe Matching Threads