*Dear All:* *I am trying to compute the p-value of the bootstrap test; please see below.* *In example 1 the p-value agrees with the confidence interval.* *BUT, in example 2 the p-value DOES NOT agree with the confidence interval. In Example 2, the p-value should be zero or close to zero.* *I am not sure what went wrong, or not sure if I missed something.* *any help would be appreciated.* *with many thanks* *abou* ##### Two - Sample Bootstrap ##### Source: http://www.ievbras.ru/ecostat/Kiril/R/Biblio_N/R_Eng/Chernick2011.pdf ##### Example 1: ##### ---------- set.seed(1) n1 <- 29 n1 x1 <- rnorm(n1, 1.143, 0.164) #some random normal variates: mean1 = 1.143 x1 n2 <- 33 n2 x2 <- rnorm(n2, 1.175, 0.169) #2nd random sample: mean2 = 1.175 x2 obs.diff.theta <- mean(x1) - mean(x2) obs.diff.theta theta <- as.vector(NULL) #### vector to hold difference estimates iterations <- 1000 for (i in 1:1000) { #bootstrap resamples xx1 <- sample(x1, n1, replace = TRUE) xx2 <- sample(x2, n2, replace = TRUE) theta[i] <- mean(xx1) - mean(xx2) } ##### Confidence Interval: ##### -------------------- quantile(theta, probs = c(.025,0.975)) #Efron percentile CI on difference in means ##### 2.5% 97.5% ##### - 0.1248539 0.0137601 ##### P-Value ##### ------- p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) p.value #### R OUTPUT #### > quantile(theta, probs = c(.025,0.975)) #### 2.5% 97.5% #### -0.12647744 0.02099391 #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) #### > p.value #### [1] 1 ##### Example 2: ##### ---------- set.seed(5) n1 <- 29 ### n1 x1 <- rnorm(n1, 10.5, 0.15) ###### sample 1 with mean1 = 10.5 ### x1 n2 <- 33 ### n2 x2 <- rnorm(n2, 1.5, 0.155) ##### Sample 2 with mean2 = 1.5 ### x2 obs.diff.theta <- mean(x1) - mean(x2) obs.diff.theta theta <- as.vector(NULL) #### vector to hold difference estimates iterations <- 1000 ##### bootstrap resamples for (i in 1:1000) { xx1 <- sample(x1, n1, replace = TRUE) xx2 <- sample(x2, n2, replace = TRUE) theta[i] <- mean(xx1) - mean(xx2) } ##### Confidence Interval: ##### -------------------- ###### CI on difference in means quantile(theta, probs = c(.025,0.975)) ##### P-Value ##### ------- p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) p.value ##### R OUTPUT #### > ###### CI on difference in means #### > #### > quantile(theta, probs = c(.025,0.975)) #### 2.5% 97.5% #### 8.908398 9.060601 #### > ##### P-Value #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) #### > p.value #### [1] 0.4835165 ______________________ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* [[alternative HTML version deleted]]
A p-value is for testing a specific null hypothesis, but you do not state your null hypothesis anywhere. It is the null value that needs to be subtracted from the bootstrap differences, not the observed difference. By subtracting the observed difference you are setting a situation where the p-value will always be about 0.5 or about 1 (depending on 1 tailed or 2 tailed). If instead you subtract a null value (such as 0), then the p-values will be closer to what you are expecting. On Fri, Nov 6, 2020 at 9:44 AM AbouEl-Makarim Aboueissa <abouelmakarim1962 at gmail.com> wrote:> > *Dear All:* > > *I am trying to compute the p-value of the bootstrap test; please see > below.* > > *In example 1 the p-value agrees with the confidence interval.* > *BUT, in example 2 the p-value DOES NOT agree with the confidence > interval. In Example 2, the p-value should be zero or close to zero.* > > *I am not sure what went wrong, or not sure if I missed something.* > > *any help would be appreciated.* > > > *with many thanks* > *abou* > > > > ##### Two - Sample Bootstrap > > ##### Source: > http://www.ievbras.ru/ecostat/Kiril/R/Biblio_N/R_Eng/Chernick2011.pdf > > ##### Example 1: > ##### ---------- > > > > set.seed(1) > > n1 <- 29 > n1 > x1 <- rnorm(n1, 1.143, 0.164) #some random normal variates: mean1 = 1.143 > x1 > > n2 <- 33 > n2 > x2 <- rnorm(n2, 1.175, 0.169) #2nd random sample: mean2 = 1.175 > x2 > > obs.diff.theta <- mean(x1) - mean(x2) > obs.diff.theta > > theta <- as.vector(NULL) #### vector to hold difference estimates > > iterations <- 1000 > > for (i in 1:1000) { #bootstrap resamples > xx1 <- sample(x1, n1, replace = TRUE) > xx2 <- sample(x2, n2, replace = TRUE) > theta[i] <- mean(xx1) - mean(xx2) > } > > > > ##### Confidence Interval: > ##### -------------------- > > > quantile(theta, probs = c(.025,0.975)) #Efron percentile CI on difference > in means > > ##### 2.5% 97.5% > ##### - 0.1248539 0.0137601 > > > ##### P-Value > ##### ------- > > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) > > p.value > > > > #### R OUTPUT > > #### > quantile(theta, probs = c(.025,0.975)) > #### 2.5% 97.5% > #### -0.12647744 0.02099391 > > #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > #### > p.value > #### [1] 1 > > ##### Example 2: > ##### ---------- > > > set.seed(5) > > n1 <- 29 > ### n1 > x1 <- rnorm(n1, 10.5, 0.15) ###### sample 1 with mean1 = 10.5 > ### x1 > > n2 <- 33 > ### n2 > x2 <- rnorm(n2, 1.5, 0.155) ##### Sample 2 with mean2 = 1.5 > ### x2 > > obs.diff.theta <- mean(x1) - mean(x2) > obs.diff.theta > > theta <- as.vector(NULL) #### vector to hold difference estimates > > iterations <- 1000 > > ##### bootstrap resamples > > for (i in 1:1000) { > xx1 <- sample(x1, n1, replace = TRUE) > xx2 <- sample(x2, n2, replace = TRUE) > theta[i] <- mean(xx1) - mean(xx2) > } > > > > ##### Confidence Interval: > ##### -------------------- > > > ###### CI on difference in means > > quantile(theta, probs = c(.025,0.975)) > > > > ##### P-Value > ##### ------- > > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) > > p.value > > ##### R OUTPUT > > #### > ###### CI on difference in means > #### > > #### > quantile(theta, probs = c(.025,0.975)) > #### 2.5% 97.5% > #### 8.908398 9.060601 > > #### > ##### P-Value > #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > #### > p.value > #### [1] 0.4835165 > > ______________________ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Statistics and Data Science* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
Dear Greg: H0: Mean 1- Mean 2 = 0 Ha: Mean 1 - Mean 2 ! = 0 with many thanks abou ______________________ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Fri, Nov 6, 2020 at 12:35 PM Greg Snow <538280 at gmail.com> wrote:> A p-value is for testing a specific null hypothesis, but you do not > state your null hypothesis anywhere. > > It is the null value that needs to be subtracted from the bootstrap > differences, not the observed difference. By subtracting the observed > difference you are setting a situation where the p-value will always > be about 0.5 or about 1 (depending on 1 tailed or 2 tailed). If > instead you subtract a null value (such as 0), then the p-values will > be closer to what you are expecting. > > On Fri, Nov 6, 2020 at 9:44 AM AbouEl-Makarim Aboueissa > <abouelmakarim1962 at gmail.com> wrote: > > > > *Dear All:* > > > > *I am trying to compute the p-value of the bootstrap test; please see > > below.* > > > > *In example 1 the p-value agrees with the confidence interval.* > > *BUT, in example 2 the p-value DOES NOT agree with the confidence > > interval. In Example 2, the p-value should be zero or close to zero.* > > > > *I am not sure what went wrong, or not sure if I missed something.* > > > > *any help would be appreciated.* > > > > > > *with many thanks* > > *abou* > > > > > > > > ##### Two - Sample Bootstrap > > > > ##### Source: > > http://www.ievbras.ru/ecostat/Kiril/R/Biblio_N/R_Eng/Chernick2011.pdf > > > > ##### Example 1: > > ##### ---------- > > > > > > > > set.seed(1) > > > > n1 <- 29 > > n1 > > x1 <- rnorm(n1, 1.143, 0.164) #some random normal variates: mean1 = 1.143 > > x1 > > > > n2 <- 33 > > n2 > > x2 <- rnorm(n2, 1.175, 0.169) #2nd random sample: mean2 = 1.175 > > x2 > > > > obs.diff.theta <- mean(x1) - mean(x2) > > obs.diff.theta > > > > theta <- as.vector(NULL) #### vector to hold difference estimates > > > > iterations <- 1000 > > > > for (i in 1:1000) { #bootstrap resamples > > xx1 <- sample(x1, n1, replace = TRUE) > > xx2 <- sample(x2, n2, replace = TRUE) > > theta[i] <- mean(xx1) - mean(xx2) > > } > > > > > > > > ##### Confidence Interval: > > ##### -------------------- > > > > > > quantile(theta, probs = c(.025,0.975)) #Efron percentile CI on difference > > in means > > > > ##### 2.5% 97.5% > > ##### - 0.1248539 0.0137601 > > > > > > ##### P-Value > > ##### ------- > > > > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > > > ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) > > > > p.value > > > > > > > > #### R OUTPUT > > > > #### > quantile(theta, probs = c(.025,0.975)) > > #### 2.5% 97.5% > > #### -0.12647744 0.02099391 > > > > #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ > (iterations+1) > > #### > p.value > > #### [1] 1 > > > > ##### Example 2: > > ##### ---------- > > > > > > set.seed(5) > > > > n1 <- 29 > > ### n1 > > x1 <- rnorm(n1, 10.5, 0.15) ###### sample 1 with mean1 = 10.5 > > ### x1 > > > > n2 <- 33 > > ### n2 > > x2 <- rnorm(n2, 1.5, 0.155) ##### Sample 2 with mean2 = 1.5 > > ### x2 > > > > obs.diff.theta <- mean(x1) - mean(x2) > > obs.diff.theta > > > > theta <- as.vector(NULL) #### vector to hold difference estimates > > > > iterations <- 1000 > > > > ##### bootstrap resamples > > > > for (i in 1:1000) { > > xx1 <- sample(x1, n1, replace = TRUE) > > xx2 <- sample(x2, n2, replace = TRUE) > > theta[i] <- mean(xx1) - mean(xx2) > > } > > > > > > > > ##### Confidence Interval: > > ##### -------------------- > > > > > > ###### CI on difference in means > > > > quantile(theta, probs = c(.025,0.975)) > > > > > > > > ##### P-Value > > ##### ------- > > > > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > > > ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) > > > > p.value > > > > ##### R OUTPUT > > > > #### > ###### CI on difference in means > > #### > > > #### > quantile(theta, probs = c(.025,0.975)) > > #### 2.5% 97.5% > > #### 8.908398 9.060601 > > > > #### > ##### P-Value > > #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ > (iterations+1) > > > > #### > p.value > > #### [1] 0.4835165 > > > > ______________________ > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > *Professor, Statistics and Data Science* > > *Graduate Coordinator* > > > > *Department of Mathematics and Statistics* > > *University of Southern Maine* > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538280 at gmail.com >[[alternative HTML version deleted]]