Viechtbauer Wolfgang (STAT)
2007-May-21 14:41 UTC
[R] Boostrap p-value in regression [indirectly related to R]
Hello All, Despite my preference for reporting confidence intervals, I need to obtain a p-value for a hypothesis test in the context of regression using bootstrapping. I have read John Fox's chapter on bootstrapping regression models and have consulted Efron & Tibshirani's "An Introduction to the Bootstrap" but I just wanted to ask the experts here for some feedback to make sure that I am not doing something wrong. Let's take a simplified example where the model includes one independent variable and the idea is to test H0: beta1 = 0 versus Ha: beta1 != 0. ######################################################## ### generate some sample data n <- 50 xi <- runif(n, min=1, max=5) yi <- 0 + 0.2 * xi + rnorm(n, mean=0, sd=1) ### fit simple regression model mod <- lm(yi ~ xi) summary(mod) b1 <- coef(mod)[2] t1 <- coef(mod)[2] / coef(summary(mod))[2,2] ### 1000 bootstrap replications using (X,Y)-pair resampling t1.star <- rep(NA,1000) for (i in 1:1000) { ids <- sample(1:n, replace=TRUE) newyi <- yi[ids] newxi <- xi[ids] mod <- lm(newyi ~ newxi) t1.star[i] <- ( coef(mod)[2] - b1) / coef(summary(mod))[2,2] } ### get bootstrap p-value hist(t1.star, nclass=40) abline(v=t1, lwd=3) abline(v=-1*t1, lwd=3) 2 * mean( t1.star > abs(t1) ) ######################################################## As suggested in the chapter on bootstrapping regression models by John Fox, the bootstrap p-value is 2 times the proportion of bootstrap t-values (with b1 subtracted so that we get the distribution under H0) larger than the absolute value of the actual t-value observed in the data. Doesn't this assume that the bootstrap sampling distribution is symmetric? And if yes, would it then not be more reasonable to calculate: mean( abs(t1.star) > abs(t1) ) or in words: the number of bootstrap t-values that are more extreme on either side of the bootstrap distribution than the actual t-value observed? Any suggestions or comments would be appreciated! -- Wolfgang Viechtbauer Department of Methodology and Statistics University of Maastricht, The Netherlands http://www.wvbauer.com
John Fox
2007-May-22 13:11 UTC
[R] Boostrap p-value in regression [indirectly related to R]
Dear Wolfgang, I agree that it's preferable to compute the two-sided p-value without assuming symmetry. Another, equivalent, way of thinking about this is to use t^2 for the two-sided test in place of t. BTW, the formula used in my appendix (for the one-sided p-value) is from Davison and Hinkley, I believe, and differs trivially from the one in Efron and Tibshirani. Regards, John -------------------------------- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Viechtbauer Wolfgang (STAT) > Sent: Monday, May 21, 2007 10:41 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Boostrap p-value in regression [indirectly related to R] > > Hello All, > > Despite my preference for reporting confidence intervals, I > need to obtain a p-value for a hypothesis test in the context > of regression using bootstrapping. I have read John Fox's > chapter on bootstrapping regression models and have consulted > Efron & Tibshirani's "An Introduction to the Bootstrap" but I > just wanted to ask the experts here for some feedback to make > sure that I am not doing something wrong. > > Let's take a simplified example where the model includes one > independent variable and the idea is to test H0: beta1 = 0 > versus Ha: beta1 != 0. > > ######################################################## > > ### generate some sample data > > n <- 50 > xi <- runif(n, min=1, max=5) > yi <- 0 + 0.2 * xi + rnorm(n, mean=0, sd=1) > > ### fit simple regression model > > mod <- lm(yi ~ xi) > summary(mod) > b1 <- coef(mod)[2] > t1 <- coef(mod)[2] / coef(summary(mod))[2,2] > > ### 1000 bootstrap replications using (X,Y)-pair resampling > > t1.star <- rep(NA,1000) > > for (i in 1:1000) { > > ids <- sample(1:n, replace=TRUE) > newyi <- yi[ids] > newxi <- xi[ids] > mod <- lm(newyi ~ newxi) > t1.star[i] <- ( coef(mod)[2] - b1) / coef(summary(mod))[2,2] > > } > > ### get bootstrap p-value > > hist(t1.star, nclass=40) > abline(v=t1, lwd=3) > abline(v=-1*t1, lwd=3) > 2 * mean( t1.star > abs(t1) ) > > ######################################################## > > As suggested in the chapter on bootstrapping regression > models by John Fox, the bootstrap p-value is 2 times the > proportion of bootstrap t-values (with b1 subtracted so that > we get the distribution under H0) larger than the absolute > value of the actual t-value observed in the data. > > Doesn't this assume that the bootstrap sampling distribution > is symmetric? And if yes, would it then not be more reasonable to > calculate: > > mean( abs(t1.star) > abs(t1) ) > > or in words: the number of bootstrap t-values that are more > extreme on either side of the bootstrap distribution than the > actual t-value observed? > > Any suggestions or comments would be appreciated! > > -- > Wolfgang Viechtbauer > Department of Methodology and Statistics University of > Maastricht, The Netherlands http://www.wvbauer.com > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >