Andrew Redd
2011-Nov-01 16:33 UTC
[R] Sample size calculations for one sided binomial exact test
I'm trying to compute sample size requirements for a binomial exact test. we want to show that the proportion is at least 90% assuming that it is 95%, with 80% power so any asymptotic approximations are out of the questions. I was planning on using binom.test to perform the simple test against a prespecified value, but cannot find any functions for computing sample size. do any exist? Thanks, Andrew [[alternative HTML version deleted]]
Marc Schwartz
2011-Nov-03 16:12 UTC
[R] Sample size calculations for one sided binomial exact test
From: https://stat.ethz.ch/pipermail/r-help/2011-November/294329.html> I'm trying to compute sample size requirements for a binomial exact test. > we want to show that the proportion is at least 90% assuming that it is > 95%, with 80% power so any asymptotic approximations are out of the > questions. I was planning on using binom.test to perform the simple test > against a prespecified value, but cannot find any functions for computing > sample size. do any exist? > > Thanks, > AndrewHi, I don't have the original e-mail, so this reply will be out of the thread in the archive. I am not aware of anything pre-existing in R for this application, but stand to be corrected on that point. There are at least two "non-R" related options: 1. The G*Power program which is available from: http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/ 2. There is a paper by A'Hern which contains sample size tables here: Sample size tables for exact single-stage phase II designs R.P. A'Hern STATISTICS IN MEDICINE Statist. Med. 2001; 20:859?866 http://stat.ethz.ch/education/semesters/as2011/bio/ahernSampleSize.pdf The author used the BINOMDIST function in Excel to derive the tables. Notwithstanding criticisms of Excel, these tables, based upon a small test sample, agree with the G*Power program, as well as my own computations using R code below. He also uses normal approximations for sample sizes >300, given the limitations found in the BINOMDIST function. Here is my R code for deriving the critical value and sample size for a one sided exact binomial test, given an alpha, a null proportion, an alternate proportion and the desired power: # The possible sample size vector N needs to be selected in such a fashion # that it covers the possible range of values that include the true # minima. My example here does with a finite range and makes the # plot easier to visualize. N <- 100:200 Alpha <- 0.05 Pow <- 0.8 p0 <- 0.90 p1 <- 0.95 # Required number of events, given a vector of sample sizes (N) # to be considered at the null proportion, for the given Alpha CritVal <- qbinom(p = 1 - Alpha, size = N, prob = p0) # Get Beta (Type II error) for each N at the alternate hypothesis # proportion Beta <- pbinom(CritVal, N, p1) # Get the Power Power <- 1 - Beta # Find the smallest sample size yielding at least the required power SampSize <- min(which(Power > Pow)) # Get and print the required number of events to reject the null # given the sample size required (Res <- paste(CritVal[SampSize] + 1, "out of", N[SampSize])) # Plot it all plot(N, Power, type = "b", las = 1) title(paste("One Sided Sample Size and Critical Value for H0 =", p0, "versus HA = ", p1, "\n", "For Power = ", Pow), cex.main = 0.95) points(N[SampSize], Power[SampSize], col = "red", pch = 19) text(N[SampSize], Power[SampSize], col = "red", label = Res, pos = 3) abline(h = Pow, lty = "dashed") One thing to note here (see the plot) is the non-monotonic function describing the power at each of the values of the sample size. This is due to the discrete nature of the binomial distribution. It also generally means that you are powering the sample size calculation for an alpha at something lower than the value indicated. The G*Power program provides both the actual alpha and power, given the input values. So there is a need to search the vector of sample sizes where the power is greater than that desired, to obtain the smallest sample size required to satisfy the power desired. The above could of course be encapsulated in a function to make use easier, but the code yields values that agree with both the G*Power application and A'Hern's tables. Hope that this is helpful. Regards, Marc Schwartz