thr3ads.net - R help - [R] memory and bootstrapping [May 2011]

If this information is useful, please help other people find it:
Share via:

E Hofstadler

2011-May-05 07:08 UTC

[R] memory and bootstrapping

hello,

the following questions will without doubt reveal some fundamental
ignorance, but hopefully you can still help me out.

I'd like to bootstrap a coefficient gained on the basis of the
coefficients in a logistic regression model (the mean differences in
the predicted probabilities between two groups, where each predict()
operation uses as the newdata-argument a dataframe of equal size as
the original dataframe).I've got 130,000 rows and 7 columns in my
dataframe. The glm-model uses all variables (as well as two 2-way
interactions).

System:
- R-version: 2.12.2
- OS: Windows XP Pro, 32-bit
- 3.16Ghz intel dual core processor, 2.9GB RAM

I'm using the boot package to arrive at the standard errors for this
difference, but even with only 10 replications, this takes quite a
long time: 216 seconds (perhaps this is partly also due to my
inefficiently programmed function underlying the boot-call, I'm also
looking into that).

I wanted to try out calculating a bca-bootstrapped confidence
interval, which as I understand requires a lot more replications than
normal-theory intervals. Drawing on John Fox' Appendix to his "An R
Companion to Applied Regression", I was thinking of trying out 2000
replications -- but this will take several hours to compute on my
system (which isn't in itself a major issue though).

My Questions:
- let's say I try bootstrapping with 2000 replications. Can I be
certain that the memory available to R  will be sufficient for this
operation?
- (this relates to statistics more generally): is it a good idea in
your opinion to try bca-bootstrapping, or can it be assumed that a
normal theory confidence interval will be a sufficiently good
approximation (letting me get away with, say, 500 replications)?


Best,
Esther

Prof Brian Ripley

2011-May-05 08:01 UTC

head link

[R] memory and bootstrapping

The only reason the boot package will take more memory for 2000 
replications than 10 is that it needs to store the results.  That is 
not to say that on a 32-bit OS the fragmentation will not get worse, 
but that is unlikely to be a significant factor.

As for the methodology: 'boot' is support software for a book, so 
please consult it (and not secondary sources).  From your brief 
description it looks to me as if you should be using studentized CIs.

130,000 cases is a lot, and running the experiment on a 1% sample 
may well show that asymptotic CIs are good enough.

On Thu, 5 May 2011, E Hofstadler wrote:
> hello,
>
> the following questions will without doubt reveal some fundamental
> ignorance, but hopefully you can still help me out.
>
> I'd like to bootstrap a coefficient gained on the basis of the
> coefficients in a logistic regression model (the mean differences in
> the predicted probabilities between two groups, where each predict()
> operation uses as the newdata-argument a dataframe of equal size as
> the original dataframe).I've got 130,000 rows and 7 columns in my
> dataframe. The glm-model uses all variables (as well as two 2-way
> interactions).
>
> System:
> - R-version: 2.12.2
> - OS: Windows XP Pro, 32-bit
> - 3.16Ghz intel dual core processor, 2.9GB RAM
>
> I'm using the boot package to arrive at the standard errors for this
> difference, but even with only 10 replications, this takes quite a
> long time: 216 seconds (perhaps this is partly also due to my
> inefficiently programmed function underlying the boot-call, I'm also
> looking into that).
>
> I wanted to try out calculating a bca-bootstrapped confidence
> interval, which as I understand requires a lot more replications than
> normal-theory intervals. Drawing on John Fox' Appendix to his "An
R
> Companion to Applied Regression", I was thinking of trying out 2000
> replications -- but this will take several hours to compute on my
> system (which isn't in itself a major issue though).
>
> My Questions:
> - let's say I try bootstrapping with 2000 replications. Can I be
> certain that the memory available to R  will be sufficient for this
> operation?
> - (this relates to statistics more generally): is it a good idea in
> your opinion to try bca-bootstrapping, or can it be assumed that a
> normal theory confidence interval will be a sufficiently good
> approximation (letting me get away with, say, 500 replications)?
>
>
> Best,
> Esther
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Seemingly Similar Threads

Search for more apparently analagous threads

R help - May 2011 - memory and bootstrapping

[R] memory and bootstrapping

[R] memory and bootstrapping

Seemingly Similar Threads