thr3ads.net - R help - [R] Boostrap p-value in regression [indirectly related to R] [May 2007]

If this information is useful, please help other people find it:
Share via:

Viechtbauer Wolfgang (STAT)

2007-May-21 14:41 UTC

[R] Boostrap p-value in regression [indirectly related to R]

Hello All,

Despite my preference for reporting confidence intervals, I need to
obtain a p-value for a hypothesis test in the context of regression
using bootstrapping. I have read John Fox's chapter on bootstrapping
regression models and have consulted Efron & Tibshirani's "An
Introduction to the Bootstrap" but I just wanted to ask the experts here
for some feedback to make sure that I am not doing something wrong.

Let's take a simplified example where the model includes one independent
variable and the idea is to test H0: beta1 = 0 versus Ha: beta1 != 0.

########################################################

### generate some sample data

n  <- 50
xi <- runif(n, min=1, max=5)
yi <- 0 + 0.2 * xi + rnorm(n, mean=0, sd=1)

### fit simple regression model

mod <- lm(yi ~ xi)
summary(mod)
b1  <- coef(mod)[2]
t1  <- coef(mod)[2] / coef(summary(mod))[2,2]

### 1000 bootstrap replications using (X,Y)-pair resampling

t1.star <- rep(NA,1000)

for (i in 1:1000) {

  ids        <- sample(1:n, replace=TRUE)
  newyi      <- yi[ids]
  newxi      <- xi[ids]  
  mod        <- lm(newyi ~ newxi)
  t1.star[i] <- ( coef(mod)[2] - b1) / coef(summary(mod))[2,2]

}

### get bootstrap p-value

hist(t1.star, nclass=40)
abline(v=t1, lwd=3)
abline(v=-1*t1, lwd=3)
2 * mean( t1.star > abs(t1) )

########################################################

As suggested in the chapter on bootstrapping regression models by John
Fox, the bootstrap p-value is 2 times the proportion of bootstrap
t-values (with b1 subtracted so that we get the distribution under H0)
larger than the absolute value of the actual t-value observed in the
data. 

Doesn't this assume that the bootstrap sampling distribution is
symmetric? And if yes, would it then not be more reasonable to
calculate:

mean( abs(t1.star) > abs(t1) )

or in words: the number of bootstrap t-values that are more extreme on
either side of the bootstrap distribution than the actual t-value
observed?

Any suggestions or comments would be appreciated!

-- 
Wolfgang Viechtbauer 
 Department of Methodology and Statistics 
 University of Maastricht, The Netherlands 
 http://www.wvbauer.com

John Fox

2007-May-22 13:11 UTC

head link

[R] Boostrap p-value in regression [indirectly related to R]

Dear Wolfgang,

I agree that it's preferable to compute the two-sided p-value without
assuming symmetry. Another, equivalent, way of thinking about this is to use
t^2 for the two-sided test in place of t.

BTW, the formula used in my appendix (for the one-sided p-value) is from
Davison and Hinkley, I believe, and differs trivially from the one in Efron
and Tibshirani.

Regards,
 John

--------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Viechtbauer Wolfgang (STAT)
> Sent: Monday, May 21, 2007 10:41 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Boostrap p-value in regression [indirectly related to R]
> 
> Hello All,
> 
> Despite my preference for reporting confidence intervals, I 
> need to obtain a p-value for a hypothesis test in the context 
> of regression using bootstrapping. I have read John Fox's 
> chapter on bootstrapping regression models and have consulted 
> Efron & Tibshirani's "An Introduction to the Bootstrap"
but I
> just wanted to ask the experts here for some feedback to make 
> sure that I am not doing something wrong.
> 
> Let's take a simplified example where the model includes one 
> independent variable and the idea is to test H0: beta1 = 0 
> versus Ha: beta1 != 0.
> 
> ########################################################
> 
> ### generate some sample data
> 
> n  <- 50
> xi <- runif(n, min=1, max=5)
> yi <- 0 + 0.2 * xi + rnorm(n, mean=0, sd=1)
> 
> ### fit simple regression model
> 
> mod <- lm(yi ~ xi)
> summary(mod)
> b1  <- coef(mod)[2]
> t1  <- coef(mod)[2] / coef(summary(mod))[2,2]
> 
> ### 1000 bootstrap replications using (X,Y)-pair resampling
> 
> t1.star <- rep(NA,1000)
> 
> for (i in 1:1000) {
> 
>   ids        <- sample(1:n, replace=TRUE)
>   newyi      <- yi[ids]
>   newxi      <- xi[ids]  
>   mod        <- lm(newyi ~ newxi)
>   t1.star[i] <- ( coef(mod)[2] - b1) / coef(summary(mod))[2,2]
> 
> }
> 
> ### get bootstrap p-value
> 
> hist(t1.star, nclass=40)
> abline(v=t1, lwd=3)
> abline(v=-1*t1, lwd=3)
> 2 * mean( t1.star > abs(t1) )
> 
> ########################################################
> 
> As suggested in the chapter on bootstrapping regression 
> models by John Fox, the bootstrap p-value is 2 times the 
> proportion of bootstrap t-values (with b1 subtracted so that 
> we get the distribution under H0) larger than the absolute 
> value of the actual t-value observed in the data. 
> 
> Doesn't this assume that the bootstrap sampling distribution 
> is symmetric? And if yes, would it then not be more reasonable to
> calculate:
> 
> mean( abs(t1.star) > abs(t1) )
> 
> or in words: the number of bootstrap t-values that are more 
> extreme on either side of the bootstrap distribution than the 
> actual t-value observed?
> 
> Any suggestions or comments would be appreciated!
> 
> --
> Wolfgang Viechtbauer
>  Department of Methodology and Statistics  University of 
> Maastricht, The Netherlands  http://www.wvbauer.com
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more reasonably related threads

R help - May 2007 - Boostrap p-value in regression [indirectly related to R]

[R] Boostrap p-value in regression [indirectly related to R]

[R] Boostrap p-value in regression [indirectly related to R]

Apparently Analagous Threads