thr3ads.net - R help - [R] group definition for a bootstrap [Jul 2004]

If this information is useful, please help other people find it:
Share via:

Louize Hill

2004-Jul-26 14:06 UTC

[R] group definition for a bootstrap

Hi,
This is probably really simple, but I am clearly not R-minded, I have read
the help files, and reread them, and I still can't work out what to do...
I have a data frame (d) with 3 columns (age (0-5), quarter (1-4) and x).
I want to estimate the precision of my mean x by age and quarter, so I want
to carry out a bootstrap for each group.
I am trying to do this within a loop, so I don't have to retype the whole
thing out 20 times ...
This is what I have done:

n = length (d$x)
N = 1000
stat = numeric(N)
for (i in 1:N) {
d$x2 = sample (d$x, n, replace=T)
stat[i] = mean(d$x2)
}

I believe I should define the age and quarter groups in the line straight
after "for" - using the split function, or should I use some variant
of
[d$age == "1" & d$quarter =="1"] in the sample
definition?
Please don't think I am looking for an easy answer - I have been puzzling
for this for over a week already :(
(I am using R1.9 under MS W2000)
Thank you in advance
Louize

Tore Wentzel-Larsen

2004-Jul-28 12:25 UTC

head link

[R] Re: group definition for a bootstrap

Hi,
There dose not seem to be any other replies so far, may be because the question
is not quite clear. Do you want a separate estimation of the 'precision of
the mean'
for each age by quarter subsample? This reply is based on that interpretation.
If not, you may need some sort of stratified bootstrap (see the article by Canty
mentioned in the code below).
Also, since you want to use the bootstrap for the mean, are the (age by quarter)
subsample sizes small, or the subsamples heavily skewed?
In any case it is better to use the package boot by Canty and Ripley than
programming
from scratch. A good introduction to this package is written by Angelo J. Canty
in R
Newsletter 2/3, December 2002. 
The following code first generates an aretificial (large) data frame dd with the
same structure
as your fram d. Then it performs bootstrap calculations in each age by quarter
subsample.
The output includes for each subsample bootstrap based bias and standard error
for the mean,
plots for checking discreteness problems of the bootstrap and normal and BCa
confidence intervals.

To use this code for your sample it should suffice to set 

dd <- d

and start at the line
# prepares bootstrap procedure for the mean

in the code. The code has been checked for the large artificial sample
generated.
It will probably break down if some subsamples are empty or nearly so.
And there may be discreteness problems if some subsamples are small, although
the
bootstrap is fairly stable for means.

Best,
Tore Wentzel-Larsen


# 'Group definition for a bootstrap'

# generates artificial data frame dd to use (age and quarter as factors in case
they are in your data):
nn <- floor(runif(1,1000,2000)) # random sample size
dd <- data.frame(factor(floor(runif(nn,0,6))), factor(floor(runif(nn,1,5))),
rnorm(nn,15,3))
names(dd) <- c('age','quarter','x')

# prepares bootstrap procedure for the mean
#	(cf the article by Angelo J. Canty in R Newsletter 2/3, December 2002):

library(boot)
mean.w <- function(xx,ww) sum(xx*ww)

# means, with bootstrap estimates of bias and standard error,
#	for each age by quarter subsample:

nboot <- 1000 # number of bootstrap replications

par(ask = TRUE) # prompts before next plot
for (age in 0:5)
{
for (quarter in 1:4)
{
dd.sub <- dd[dd$age==age&dd$quarter==quarter,] # the subsample
boot.subsample <- boot(data=dd.sub$x, statistic=mean.w, R=nboot,
stype='w')
print(boot.subsample)
print(c('age:', age, 'quarter:', quarter)) # for keeping track
of the subsamples
plot(boot.subsample,) # checks for discreteness problems
# bootstrap confidence intervals (normal based and BCa):
ci.subsample <- boot.ci(boot.subsample, type=c('norm','bca'),
conf=c(.95))
print(ci.subsample)
} 
}

par(ask = FALSE) # back to default setting for ask




> Hi,
> This is probably really simple, but I am clearly not R-minded, I have read
> the help files, and reread them, and I still can't work out what to
do...
> I have a data frame (d) with 3 columns (age (0-5), quarter (1-4) and x).
> I want to estimate the precision of my mean x by age and quarter, so I want
> to carry out a bootstrap for each group.
> I am trying to do this within a loop, so I don't have to retype the
whole
> thing out 20 times ...
> This is what I have done:
> 
> n = length (d$x)
> N = 1000
> stat = numeric(N)
> for (i in 1:N) {
> d$x2 = sample (d$x, n, replace=T)
> stat[i] = mean(d$x2)
> }
> 
> I believe I should define the age and quarter groups in the line straight
> after "for" - using the split function, or should I use some
variant of
> [d$age == "1" & d$quarter =="1"] in the sample
definition?
> Please don't think I am looking for an easy answer - I have been
puzzling
> for this for over a week already :(
> (I am using R1.9 under MS W2000)
> Thank you in advance
> Louize

Maybe Matching Threads

Search for more apparently analagous threads

R help - Jul 2004 - group definition for a bootstrap

[R] group definition for a bootstrap

[R] Re: group definition for a bootstrap

Maybe Matching Threads