AbouEl-Makarim Aboueissa
2021-Aug-10 10:34 UTC
[R] Sample size Determination to Compare Three Independent Proportions
Hi Marc: First, thank you very much for your help in this matter. Will perform an initial omnibus test of all three groups (e.g. 3 x 2 chi-square), possibly followed by all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3, 2 versus 3), We can assume *either* the desired sample size in each group is the same *or* proportional to the population size. We can set p=0.25 and set p1=p2=p3=p so that the H0 is true. We can assume that the expected proportion of "Yes" values in each group is 0.25 For the alternative hypotheses, for example, we can set p1 = .25, p2=.25, p3=.35 Again thank you very much in advance. abou ______________________ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Mon, Aug 9, 2021 at 10:53 AM Marc Schwartz <marc_schwartz at me.com> wrote:> Hi, > > You are going to need to provide more information than what you have > below and I may be mis-interpreting what you have provided. > > Presuming you are designing a prospective, three-group, randomized > allocation study, there is typically an a priori specification of the > ratios of the sample sizes for each group such as 1:1:1, indicating that > the desired sample size in each group is the same. > > You would also need to specify the expected proportions of "Yes" values > in each group. > > Further, you need to specify how you are going to compare the > proportions in each group. Are you going to perform an initial omnibus > test of all three groups (e.g. 3 x 2 chi-square), possibly followed by > all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3, 2 > versus 3), or are you just going to compare 2 versus 1, and 3 versus 1, > where 1 is a control group? > > Depending upon your testing plan, you may also need to account for p > value adjustments for multiple comparisons, in which case, you also need > to specify what adjustment method you plan to use, to know what the > target alpha level will be. > > On the other hand, if you already have the data collected, thus have > fixed sample sizes available per your wording below, simply go ahead and > perform your planned analyses, as the notion of "power" is largely an a > priori consideration, which reflects the probability of finding a > "statistically significant" result at a given alpha level, given that > your a priori assumptions are valid. > > Regards, > > Marc Schwartz > > > AbouEl-Makarim Aboueissa wrote on 8/9/21 9:41 AM: > > Dear All: good morning > > > > *Re:* Sample Size Determination to Compare Three Independent Proportions > > > > *Situation:* > > > > Three Binary variables (Yes, No) > > > > Three independent populations with fixed sizes (*say:* N1 = 1500, N2 > 900, > > N3 = 1350). > > > > Power = 0.80 > > > > How to choose the sample sizes to compare the three proportions of ?Yes? > > among the three variables. > > > > If you know a reference to this topic, it will be very helpful too. > > > > with many thanks in advance > > > > abou > > ______________________ > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > *Professor, Statistics and Data Science* > > *Graduate Coordinator* > > > > *Department of Mathematics and Statistics* > > *University of Southern Maine* > > > >[[alternative HTML version deleted]]
Marc Schwartz
2021-Aug-10 13:28 UTC
[R] Sample size Determination to Compare Three Independent Proportions
Hi, A search would suggest that there may not be an R function/package that provides power/sample size calculations for the specific scenarios that you are describing. There may be something that I am missing, and there is also other dedicated software such as PASS (https://www.ncss.com/software/pass/) which is not free, but provides a large library of possibly relevant functions and support. That being said, you can run Monte Carlo simulations in R to achieve the results you want, while providing yourself with options relative to study design, intended tests, and adjustments for multiple comparisons as apropos. Many prefer this approach, since it gives you specific control over this process. Taking the simple case, where you are going to run a 3 x 2 chi-square as your primary endpoint, and want to power for that, here is a possible function, with the same sample size in each group: ThreeGroups <- function(n, p1, p2, p3, R = 10000, power = 0.8) { MCSim <- function(n, p1, p2, p3) { ## Create a binary distribution for each group G1 <- rbinom(n, 1, p1) G2 <- rbinom(n, 1, p2) G3 <- rbinom(n, 1, p3) ## Create a 3 x 2 matrix containing the 3 group counts MAT <- cbind(table(G1), table(G2), table(G3)) ## Perform a chi-square and just return the p value chisq.test(MAT)$p.value } ## Replicate the above R times, and get ## a distribution of p values MC <- replicate(R, MCSim(n, p1, p2, p3)) ## Get the p value at the desired "power" quantile quantile(MC, power) } Essentially, the above internal MCSim() function generates 3 random samples of size 'n' from the binomial distribution, at the 3 proportions desired. For each run, it will perform a chi-square test of the 3 x 2 matrix of counts, returning the p value for each run. The main function will then return the p value at the quantile (power) within the generated distribution of p values. You can look at the help pages for the various functions that I use above, to get a sense for how they work. You increase the sample size ('n') until you get a p value returned <= 0.05, if that is your desired alpha level. You also want 'R', the number of replications within each run, to be large enough so that the returned p value quantile is relatively stable. Values for 'R', once you get "close to" the desired p value should be on the order of 1,000,000 or higher. Stay with lower values for 'R' until you get in the ballpark of your target, since larger values take much longer to run. Thus, using your example proportions of 0.25, 0.25, and 0.35: ## 250 per group, 750 total - Not enough > ThreeGroups(250, 0.25, 0.25, 0.35, R = 10000) 80% 0.08884723 ## 350 per group, 1050 total - Too high > ThreeGroups(350, 0.25, 0.25, 0.35, R = 10000) 80% 0.0270829 ## 300 per group, 900 total - Close! > ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000) 80% 0.04818842 So, keep tweaking the sample size until you get a returned p value at your target alpha level, with a large enough 'R', so that you get consistent sample sizes for multiple runs. If I run 300 per group again, with 10,000 replicates: > ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000) 80% 0.05033933 the returned p value is slightly higher. So, again, increase R to improve the stability of the returned p value and run it multiple times to be comfortable that the p value change is less than an acceptable threshold. Now, the tricky part is to decide if the 3 x 2 is your primary endpoint, and want to power only for that, or, if you also want to power for the other two-group comparisons, possibly having to account for p value adjustments for the multiple comparisons, resulting in the need to power for a lower alpha level for those tests. In that scenario, you would end up taking the largest sample size that you identify across the various hypotheses, recognizing that while you are powering for one hypothesis, you may be overpowering for others. That is something that you need to decide, and perhaps consider consulting with other local statistical expertise, as may be apropos, in the prospective study design, possibly influenced by other relevant/similar research in your domain. You can easily modify the above function for the two-group scenario as well, and I will leave that to you. Regards, Marc AbouEl-Makarim Aboueissa wrote on 8/10/21 6:34 AM:> Hi?Marc: > > First, thank you very much for your help in this matter. > > > Will perform an initial omnibus?test of all three groups (e.g. 3 x 2 > chi-square), possibly followed by > all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3, > 2?versus 3), > > We can assume _either_ the desired sample size in each group is the same > _or_ proportional to the population size. > > ?We can set p=0.25 and set p1=p2=p3=p so that the H0 is true. > > We can assume that the expected proportion of "Yes" values in each group > is 0.25 > > For the alternative hypotheses, for example,? we can set? p1 = .25, > p2=.25, p3=.35 > > > Again thank you very much in?advance. > > abou > > ______________________ > > *AbouEl-Makarim Aboueissa, PhD > * > * > * > *Professor, Statistics and Data Science* > *Graduate Coordinator* > *Department of Mathematics and Statistics > * > *University of Southern Maine* > > > > On Mon, Aug 9, 2021 at 10:53 AM Marc Schwartz <marc_schwartz at me.com > <mailto:marc_schwartz at me.com>> wrote: > > Hi, > > You are going to need to provide more information than what you have > below and I may be mis-interpreting what you have provided. > > Presuming you are designing a prospective, three-group, randomized > allocation study, there is typically an a priori specification of the > ratios of the sample sizes for each group such as 1:1:1, indicating > that > the desired sample size in each group is the same. > > You would also need to specify the expected proportions of "Yes" values > in each group. > > Further, you need to specify how you are going to compare the > proportions in each group. Are you going to perform an initial omnibus > test of all three groups (e.g. 3 x 2 chi-square), possibly followed by > all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3, 2 > versus 3), or are you just going to compare 2 versus 1, and 3 versus 1, > where 1 is a control group? > > Depending upon your testing plan, you may also need to account for p > value adjustments for multiple comparisons, in which case, you also > need > to specify what adjustment method you plan to use, to know what the > target alpha level will be. > > On the other hand, if you already have the data collected, thus have > fixed sample sizes available per your wording below, simply go ahead > and > perform your planned analyses, as the notion of "power" is largely an a > priori consideration, which reflects the probability of finding a > "statistically significant" result at a given alpha level, given that > your a priori assumptions are valid. > > Regards, > > Marc Schwartz > > > AbouEl-Makarim Aboueissa wrote on 8/9/21 9:41 AM: > > Dear All: good morning > > > > *Re:* Sample Size Determination to Compare Three Independent > Proportions > > > > *Situation:* > > > > Three Binary variables (Yes, No) > > > > Three independent populations with fixed sizes (*say:* N1 = 1500, > N2 = 900, > > N3 = 1350). > > > > Power = 0.80 > > > > How to choose the sample sizes to compare the three proportions > of ?Yes? > > among the three variables. > > > > If you know a reference to this topic, it will be very helpful too. > > > > with many thanks in advance > > > > abou > > ______________________ > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > *Professor, Statistics and Data Science* > > *Graduate Coordinator* > > > > *Department of Mathematics and Statistics* > > *University of Southern Maine* > > >