thr3ads.net - R help - [R] Bootstrapping in R [Oct 2016]

If this information is useful, please help other people find it:
Share via:

peter dalgaard

2016-Oct-02 10:11 UTC

[R] Bootstrapping in R

> On 01 Oct 2016, at 16:11 , Daniel Nordlund <djnordlund at gmail.com>
wrote:
> 
> You haven't told us anything about the structure of your data, or the
definition of the DataSummary function.
Yes. Just let me add that a common error with boot() is not to pay attention to
the required form of the statistic= function argument. It should depend on the
data and a set of indices and (for nonparametic bootstrap) it is the indices
that are random.

Typical mistakes are to completely ignore the index argument, or to write clumsy
code that ignores the data specification, as in
coef(lm(df$y~df$x, data=d[f])).


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

ruipbarradas at sapo.pt

2016-Oct-02 12:37 UTC

head link

[R] Bootstrapping in R

Right.
To see it in action just compare the results of the two calls to boot.

library(boot)

set.seed(1007)

x <- rnorm(100)
y <- x + rnorm(100)
dat <- data.frame(x, y)

#Wrong
stat1 <- function(DF, f){
	model <- lm(DF$y ~ DF$x, data = DF[f,])  #Doesn't bootstrap DF
	coef(model)
}

#Correct
stat2 <- function(DF, f){
	model <- lm(y ~ x, data = DF[f,])
	coef(model)
}

boot(dat, stat1, R = 100)
boot(dat, stat2, R = 100)


Rui Barradas


Citando peter dalgaard <pdalgd at gmail.com>:
>> On 01 Oct 2016, at 16:11 , Daniel Nordlund <djnordlund at
gmail.com> wrote:
>>
>> You haven't told us anything about the structure of your data, or  
>> the definition of the DataSummary function.
>
> Yes. Just let me add that a common error with boot() is not to pay  
> attention to the required form of the statistic= function argument.  
> It should depend on the data and a set of indices and (for  
> nonparametic bootstrap) it is the indices that are random.
>
> Typical mistakes are to completely ignore the index argument, or to  
> write clumsy code that ignores the data specification, as in
> coef(lm(df$y~df$x, data=d[f])).
>
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bryan Mac

2016-Oct-03 07:24 UTC

head link

[R] Bootstrapping in R

Hi all,

Here is the first six rows of my data. In total I have 1269 rows.  
My goal is to get conduct nonparametric bootstrap and case resampling. 
I would like to randomly select 100 out of the 1269 After that, I wish to
bootstrap that randomly selected 100 out of 1269.

I assume I need to set the seed to conduct this randomization, as with
bootstrapping you would get varied results each time the code is run.

##   NAR  SQRTNAR NIC  SQRTNIC
## 1 2.6 1.612452 5.6 2.366432
## 2 8.1 2.846050 9.9 3.146427
## 3 5.7 2.387467 7.1 2.664583
## 4 8.3 2.880972 8.1 2.846050
## 5 7.3 2.701851 9.9 3.146427
## 6 4.9 2.213594 8.6 2.932576
Here is my definition of the DataSummary function.

DataSummary <- function(df, indices){
  sample <- df[indices, ]
  
  sumry_for_NAR <- summary(sample$NAR)
  nms <- names(sumry_for_NAR)
  nms <- c(nms, 'std')
  out_for_NAR <- c(sumry_for_NAR, sd(sample$NAR))
  names(out_for_NAR) <- nms
  
  sumry_for_SQRTNAR <- summary(sample$SQRTNAR)
  nms <- names(sumry_for_SQRTNAR)
  nms <- c(nms, 'std')
  out_for_SQRTNAR <- c(sumry_for_SQRTNAR, sd(sample$SQRTNAR))
  names(out_for_SQRTNAR) <- nms
  
  sumry_for_NIC <- summary(sample$NIC)
  nms <- names(sumry_for_NIC)
  nms <- c(nms, 'std')
  out_for_NIC <- c(sumry_for_NIC, sd(sample$NIC))
  names(out_for_NIC) <- nms
  
  sumry_for_SQRTNIC <- summary(sample$SQRTNIC)
  nms <- names(sumry_for_SQRTNIC)
  nms <- c(nms, 'std')
  out_for_SQRTNIC <- c(sumry_for_SQRTNIC, sd(sample$SQRTNIC))
  names(out_for_SQRTNIC) <- nms
  
  OUT <- c(out_for_NAR, out_for_SQRTNAR, out_for_NIC, out_for_SQRTNIC)
  
  return(OUT)
}
Again, here is my attempt at bootstrapping.

result <- boot(n_data, statistic = DataSummary, R = 100)
result

 Per suggestions, would I go with this code to achieve my goal?  So, the best
reference/resource is the boot help page. I found code through various sites and
I got really confused because they were very different from each other.
> set.seed(1007)
> 
> x <- rnorm(100)
> y <- x + rnorm(100)
> dat <- data.frame(x, y)
> stat2 <- function(DF, f){
> 	model <- lm(y ~ x, data = DF[f,])
> 	coef(model)
> }
> 
> boot(dat, stat1, R = 100)
> boot(dat, stat2, R = 100)



Bryan Mac
bryanmac.24 at gmail.com


> On Oct 2, 2016, at 5:37 AM, ruipbarradas at sapo.pt wrote:
> 
> Right.
> To see it in action just compare the results of the two calls to boot.
> 
> library(boot)
> 
> set.seed(1007)
> 
> x <- rnorm(100)
> y <- x + rnorm(100)
> dat <- data.frame(x, y)
> 
> #Wrong
> stat1 <- function(DF, f){
> 	model <- lm(DF$y ~ DF$x, data = DF[f,])  #Doesn't bootstrap DF
> 	coef(model)
> }
> 
> #Correct
> stat2 <- function(DF, f){
> 	model <- lm(y ~ x, data = DF[f,])
> 	coef(model)
> }
> 
> boot(dat, stat1, R = 100)
> boot(dat, stat2, R = 100)
> 
> 
> Rui Barradas
> 
> 
> Citando peter dalgaard <pdalgd at gmail.com>:
> 
>>> On 01 Oct 2016, at 16:11 , Daniel Nordlund <djnordlund at
gmail.com> wrote:
>>> 
>>> You haven't told us anything about the structure of your data,
or the definition of the DataSummary function.
>> 
>> Yes. Just let me add that a common error with boot() is not to pay
attention to the required form of the statistic= function argument. It should
depend on the data and a set of indices and (for nonparametic bootstrap) it is
the indices that are random.
>> 
>> Typical mistakes are to completely ignore the index argument, or to
write clumsy code that ignores the data specification, as in
>> coef(lm(df$y~df$x, data=d[f])).
>> 
>> 
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 

	[[alternative HTML version deleted]]

R help - Oct 2016 - Bootstrapping in R

[R] Bootstrapping in R

[R] Bootstrapping in R

[R] Bootstrapping in R