*Dear All:*
*I am trying to compute the p-value of the bootstrap test; please see
below.*
*In example 1 the p-value agrees with the confidence interval.*
*BUT, in example 2 the p-value DOES NOT agree with the confidence
interval. In Example 2, the p-value should be zero or close to zero.*
*I am not sure what went wrong, or not sure if I missed something.*
*any help would be appreciated.*
*with many thanks*
*abou*
##### Two - Sample Bootstrap
##### Source:
http://www.ievbras.ru/ecostat/Kiril/R/Biblio_N/R_Eng/Chernick2011.pdf
##### Example 1:
##### ----------
set.seed(1)
n1 <- 29
n1
x1 <- rnorm(n1, 1.143, 0.164) #some random normal variates: mean1 = 1.143
x1
n2 <- 33
n2
x2 <- rnorm(n2, 1.175, 0.169) #2nd random sample: mean2 = 1.175
x2
obs.diff.theta <- mean(x1) - mean(x2)
obs.diff.theta
theta <- as.vector(NULL) #### vector to hold difference estimates
iterations <- 1000
for (i in 1:1000) { #bootstrap resamples
xx1 <- sample(x1, n1, replace = TRUE)
xx2 <- sample(x2, n2, replace = TRUE)
theta[i] <- mean(xx1) - mean(xx2)
}
##### Confidence Interval:
##### --------------------
quantile(theta, probs = c(.025,0.975)) #Efron percentile CI on difference
in means
##### 2.5% 97.5%
##### - 0.1248539 0.0137601
##### P-Value
##### -------
p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1)
##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1)
p.value
#### R OUTPUT
#### > quantile(theta, probs = c(.025,0.975))
#### 2.5% 97.5%
#### -0.12647744 0.02099391
#### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/
(iterations+1)
#### > p.value
#### [1] 1
##### Example 2:
##### ----------
set.seed(5)
n1 <- 29
### n1
x1 <- rnorm(n1, 10.5, 0.15) ###### sample 1 with mean1 = 10.5
### x1
n2 <- 33
### n2
x2 <- rnorm(n2, 1.5, 0.155) ##### Sample 2 with mean2 = 1.5
### x2
obs.diff.theta <- mean(x1) - mean(x2)
obs.diff.theta
theta <- as.vector(NULL) #### vector to hold difference estimates
iterations <- 1000
##### bootstrap resamples
for (i in 1:1000) {
xx1 <- sample(x1, n1, replace = TRUE)
xx2 <- sample(x2, n2, replace = TRUE)
theta[i] <- mean(xx1) - mean(xx2)
}
##### Confidence Interval:
##### --------------------
###### CI on difference in means
quantile(theta, probs = c(.025,0.975))
##### P-Value
##### -------
p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1)
##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1)
p.value
##### R OUTPUT
#### > ###### CI on difference in means
#### >
#### > quantile(theta, probs = c(.025,0.975))
#### 2.5% 97.5%
#### 8.908398 9.060601
#### > ##### P-Value
#### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/
(iterations+1)
#### > p.value
#### [1] 0.4835165
______________________
*AbouEl-Makarim Aboueissa, PhD*
*Professor, Statistics and Data Science*
*Graduate Coordinator*
*Department of Mathematics and Statistics*
*University of Southern Maine*
[[alternative HTML version deleted]]
A p-value is for testing a specific null hypothesis, but you do not state your null hypothesis anywhere. It is the null value that needs to be subtracted from the bootstrap differences, not the observed difference. By subtracting the observed difference you are setting a situation where the p-value will always be about 0.5 or about 1 (depending on 1 tailed or 2 tailed). If instead you subtract a null value (such as 0), then the p-values will be closer to what you are expecting. On Fri, Nov 6, 2020 at 9:44 AM AbouEl-Makarim Aboueissa <abouelmakarim1962 at gmail.com> wrote:> > *Dear All:* > > *I am trying to compute the p-value of the bootstrap test; please see > below.* > > *In example 1 the p-value agrees with the confidence interval.* > *BUT, in example 2 the p-value DOES NOT agree with the confidence > interval. In Example 2, the p-value should be zero or close to zero.* > > *I am not sure what went wrong, or not sure if I missed something.* > > *any help would be appreciated.* > > > *with many thanks* > *abou* > > > > ##### Two - Sample Bootstrap > > ##### Source: > http://www.ievbras.ru/ecostat/Kiril/R/Biblio_N/R_Eng/Chernick2011.pdf > > ##### Example 1: > ##### ---------- > > > > set.seed(1) > > n1 <- 29 > n1 > x1 <- rnorm(n1, 1.143, 0.164) #some random normal variates: mean1 = 1.143 > x1 > > n2 <- 33 > n2 > x2 <- rnorm(n2, 1.175, 0.169) #2nd random sample: mean2 = 1.175 > x2 > > obs.diff.theta <- mean(x1) - mean(x2) > obs.diff.theta > > theta <- as.vector(NULL) #### vector to hold difference estimates > > iterations <- 1000 > > for (i in 1:1000) { #bootstrap resamples > xx1 <- sample(x1, n1, replace = TRUE) > xx2 <- sample(x2, n2, replace = TRUE) > theta[i] <- mean(xx1) - mean(xx2) > } > > > > ##### Confidence Interval: > ##### -------------------- > > > quantile(theta, probs = c(.025,0.975)) #Efron percentile CI on difference > in means > > ##### 2.5% 97.5% > ##### - 0.1248539 0.0137601 > > > ##### P-Value > ##### ------- > > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) > > p.value > > > > #### R OUTPUT > > #### > quantile(theta, probs = c(.025,0.975)) > #### 2.5% 97.5% > #### -0.12647744 0.02099391 > > #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > #### > p.value > #### [1] 1 > > ##### Example 2: > ##### ---------- > > > set.seed(5) > > n1 <- 29 > ### n1 > x1 <- rnorm(n1, 10.5, 0.15) ###### sample 1 with mean1 = 10.5 > ### x1 > > n2 <- 33 > ### n2 > x2 <- rnorm(n2, 1.5, 0.155) ##### Sample 2 with mean2 = 1.5 > ### x2 > > obs.diff.theta <- mean(x1) - mean(x2) > obs.diff.theta > > theta <- as.vector(NULL) #### vector to hold difference estimates > > iterations <- 1000 > > ##### bootstrap resamples > > for (i in 1:1000) { > xx1 <- sample(x1, n1, replace = TRUE) > xx2 <- sample(x2, n2, replace = TRUE) > theta[i] <- mean(xx1) - mean(xx2) > } > > > > ##### Confidence Interval: > ##### -------------------- > > > ###### CI on difference in means > > quantile(theta, probs = c(.025,0.975)) > > > > ##### P-Value > ##### ------- > > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) > > p.value > > ##### R OUTPUT > > #### > ###### CI on difference in means > #### > > #### > quantile(theta, probs = c(.025,0.975)) > #### 2.5% 97.5% > #### 8.908398 9.060601 > > #### > ##### P-Value > #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > #### > p.value > #### [1] 0.4835165 > > ______________________ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Statistics and Data Science* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
Dear Greg: H0: Mean 1- Mean 2 = 0 Ha: Mean 1 - Mean 2 ! = 0 with many thanks abou ______________________ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Fri, Nov 6, 2020 at 12:35 PM Greg Snow <538280 at gmail.com> wrote:> A p-value is for testing a specific null hypothesis, but you do not > state your null hypothesis anywhere. > > It is the null value that needs to be subtracted from the bootstrap > differences, not the observed difference. By subtracting the observed > difference you are setting a situation where the p-value will always > be about 0.5 or about 1 (depending on 1 tailed or 2 tailed). If > instead you subtract a null value (such as 0), then the p-values will > be closer to what you are expecting. > > On Fri, Nov 6, 2020 at 9:44 AM AbouEl-Makarim Aboueissa > <abouelmakarim1962 at gmail.com> wrote: > > > > *Dear All:* > > > > *I am trying to compute the p-value of the bootstrap test; please see > > below.* > > > > *In example 1 the p-value agrees with the confidence interval.* > > *BUT, in example 2 the p-value DOES NOT agree with the confidence > > interval. In Example 2, the p-value should be zero or close to zero.* > > > > *I am not sure what went wrong, or not sure if I missed something.* > > > > *any help would be appreciated.* > > > > > > *with many thanks* > > *abou* > > > > > > > > ##### Two - Sample Bootstrap > > > > ##### Source: > > http://www.ievbras.ru/ecostat/Kiril/R/Biblio_N/R_Eng/Chernick2011.pdf > > > > ##### Example 1: > > ##### ---------- > > > > > > > > set.seed(1) > > > > n1 <- 29 > > n1 > > x1 <- rnorm(n1, 1.143, 0.164) #some random normal variates: mean1 = 1.143 > > x1 > > > > n2 <- 33 > > n2 > > x2 <- rnorm(n2, 1.175, 0.169) #2nd random sample: mean2 = 1.175 > > x2 > > > > obs.diff.theta <- mean(x1) - mean(x2) > > obs.diff.theta > > > > theta <- as.vector(NULL) #### vector to hold difference estimates > > > > iterations <- 1000 > > > > for (i in 1:1000) { #bootstrap resamples > > xx1 <- sample(x1, n1, replace = TRUE) > > xx2 <- sample(x2, n2, replace = TRUE) > > theta[i] <- mean(xx1) - mean(xx2) > > } > > > > > > > > ##### Confidence Interval: > > ##### -------------------- > > > > > > quantile(theta, probs = c(.025,0.975)) #Efron percentile CI on difference > > in means > > > > ##### 2.5% 97.5% > > ##### - 0.1248539 0.0137601 > > > > > > ##### P-Value > > ##### ------- > > > > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > > > ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) > > > > p.value > > > > > > > > #### R OUTPUT > > > > #### > quantile(theta, probs = c(.025,0.975)) > > #### 2.5% 97.5% > > #### -0.12647744 0.02099391 > > > > #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ > (iterations+1) > > #### > p.value > > #### [1] 1 > > > > ##### Example 2: > > ##### ---------- > > > > > > set.seed(5) > > > > n1 <- 29 > > ### n1 > > x1 <- rnorm(n1, 10.5, 0.15) ###### sample 1 with mean1 = 10.5 > > ### x1 > > > > n2 <- 33 > > ### n2 > > x2 <- rnorm(n2, 1.5, 0.155) ##### Sample 2 with mean2 = 1.5 > > ### x2 > > > > obs.diff.theta <- mean(x1) - mean(x2) > > obs.diff.theta > > > > theta <- as.vector(NULL) #### vector to hold difference estimates > > > > iterations <- 1000 > > > > ##### bootstrap resamples > > > > for (i in 1:1000) { > > xx1 <- sample(x1, n1, replace = TRUE) > > xx2 <- sample(x2, n2, replace = TRUE) > > theta[i] <- mean(xx1) - mean(xx2) > > } > > > > > > > > ##### Confidence Interval: > > ##### -------------------- > > > > > > ###### CI on difference in means > > > > quantile(theta, probs = c(.025,0.975)) > > > > > > > > ##### P-Value > > ##### ------- > > > > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ (iterations+1) > > > > ##### p.value <- (sum (theta >= obs.diff.theta) + 1)/ (iterations+1) > > > > p.value > > > > ##### R OUTPUT > > > > #### > ###### CI on difference in means > > #### > > > #### > quantile(theta, probs = c(.025,0.975)) > > #### 2.5% 97.5% > > #### 8.908398 9.060601 > > > > #### > ##### P-Value > > #### > p.value <- (sum (abs(theta) >= obs.diff.theta) + 1)/ > (iterations+1) > > > > #### > p.value > > #### [1] 0.4835165 > > > > ______________________ > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > *Professor, Statistics and Data Science* > > *Graduate Coordinator* > > > > *Department of Mathematics and Statistics* > > *University of Southern Maine* > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538280 at gmail.com >[[alternative HTML version deleted]]