Arne.Muller@aventis.com
2003-Nov-27 16:04 UTC
[R] significance in difference of proportions
Hello, I'm looking for some guidance with the following problem: I've 2 samples A (111 items) and B (10 items) drawn from the same unknown population. Witihn A I find 9 "positives" and in B 0 positives. I'd like to know if the 2 samples A and B are different, ie is there a way to find out whether the number of "positives" is significantly different in A and B? I'm currently using prop.test, but unfortunately some of my data contains less than 5 items in a group (like in the example above), and the test statistics may not hold:> prop.test(c(9,0), c(111,10))2-sample test for equality of proportions with continuity correction data: c(9, 0) out of c(111, 10) X-squared = 0.0941, df = 1, p-value = 0.759 alternative hypothesis: two.sided 95 percent confidence interval: -0.02420252 0.18636468 sample estimates: prop 1 prop 2 0.08108108 0.00000000 Warning message: Chi-squared approximation may be incorrect in: prop.test(c(9, 0), c(111, 10)) Do you have suggestions for an alternative test? many thanks for your help, +kind regards, Arne
On 11/27/03 17:04, Arne.Muller at aventis.com wrote:>Hello, > >I'm looking for some guidance with the following problem: > >I've 2 samples A (111 items) and B (10 items) drawn from the same unknown >population. Witihn A I find 9 "positives" and in B 0 positives. I'd like to >know if the 2 samples A and B are different, ie is there a way to find out >whether the number of "positives" is significantly different in A and B? > >I'm currently using prop.test, but unfortunately some of my data contains >less than 5 items in a group (like in the example above), and the test >statistics may not hold:fisher.test in the ctest package, which loads automatically. -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron
> Hello, > > I'm looking for some guidance with the following problem: > > I've 2 samples A (111 items) and B (10 items) drawn from the same unknown > population. Witihn A I find 9 "positives" and in B 0 positives. I'd like to > know if the 2 samples A and B are different, ie is there a way to find out > whether the number of "positives" is significantly different in A and B? > > I'm currently using prop.test, but unfortunately some of my data contains > less than 5 items in a group (like in the example above), and the test > statistics may not hold:The statistic is fine, the approximation to its null distribution may be questionable :-)> > > prop.test(c(9,0), c(111,10)) > > 2-sample test for equality of proportions with continuity correction > > data: c(9, 0) out of c(111, 10) > X-squared = 0.0941, df = 1, p-value = 0.759 > alternative hypothesis: two.sided > 95 percent confidence interval: > -0.02420252 0.18636468 > sample estimates: > prop 1 prop 2 > 0.08108108 0.00000000 > > Warning message: > Chi-squared approximation may be incorrect in: prop.test(c(9, 0), c(111, 10)) > > > Do you have suggestions for an alternative test? >you may consider a permutation test for two independent samples: R> library(exactRankTests) R> x = c(rep(1, 9), rep(0, 102)) R> y = rep(0, 10) R> mean(x) [1] 0.08108108 R> mean(y) [1] 0 R> perm.test(y, x, exact = TRUE) 2-sample Permutation Test data: y and x T = 0, p-value = 0.6092 alternative hypothesis: true mu is not equal to 0 Best, Torsten> many thanks for your help, > +kind regards, > > Arne > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >
On 27-Nov-03 Arne.Muller at aventis.com wrote:> I've 2 samples A (111 items) and B (10 items) drawn from the same > unknown population. Witihn A I find 9 "positives" and in B 0 > positives. I'd like to know if the 2 samples A and B are different, > ie is there a way to find out whether the number of "positives" is > significantly different in A and B?Pretty obviously not, just from looking at the numbers: 9 out of 111 -> p = P(positive) approx = 1/10 P(0 out of 10 when p = 1/10) is not unlikely (in fact = 0.35). However, a Fisher exact test will give you a respectable P-value:> library(ctest) > ?fisher.test > fisher.test(matrix(c(102,9,10,0),nrow=2))[...] p-value = 1 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.000000 6.088391> fisher.test(matrix(c(102,9,9,1),nrow=2))p-value = 0.5926> fisher.test(matrix(c(102,9,8,2),nrow=2))p-value = 0.2257> fisher.test(matrix(c(102,9,7,3),nrow=2))p-value = 0.0605> fisher.test(matrix(c(102,9,6,4),nrow=2))p-value = 0.01202 So there's a 95% CI (0,6.1) for the odds ratio which, for identical probabilities of "+", is 1.0 hence well within the CI. And, keeping the numbers for the larger sample fixed for simplicity, you have to go quite a way with the smaller one to get a result significant at 5%: (102,9):(7,3) -> P = 0.06 (102,9):(6,4) -> P = 0.01 and, to have 80% power (0.8 probability of this event), the probability of "+" in the second sample would have to be as high as 0.41. Conclusion: your second sample size is quite inadequate except to detect rather large differences between the true proportions in the two cases! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 27-Nov-03 Time: 17:43:00 ------------------------------ XFMail ------------------------------
Arne.Muller@aventis.com
2003-Dec-01 17:41 UTC
[R] significance in difference of proportions
Hello, thanks for the replies to this subject. I'm using a fisher.test to test if the proportions of my 2 samples are different (see Ted's example below). The assumption was that the two samples are from the same population and that they may contain a different number of "positives" (due to different treatment). I may be able to figues out the true probability to get a "positive", since I for some of my experiments I know the entire population. E.g. the samples (111 items, and 10 items) come from a population of 10,000 items, and I know that there are 200 positives in the population. Is it possible to use the fisher test for testing equallity of proportions and to include the known probability to find a positive - would that make sense at all? If the two samples come from the same population the probability to find a positive shouldn't influence the test for difference of proportions, should it? At some point I'd like to extend the statistics so that the two samples can come from 2 different populations (with known probability for the positives). I'm happy to receive suggestions and comments on this. thanks a lot again for your help, Arne> > On 27-Nov-03 Arne.Muller at aventis.com wrote: > > I've 2 samples A (111 items) and B (10 items) drawn from the same > > unknown population. Witihn A I find 9 "positives" and in B 0 > > positives. I'd like to know if the 2 samples A and B are different, > > ie is there a way to find out whether the number of "positives" is > > significantly different in A and B? > > Pretty obviously not, just from looking at the numbers: > > 9 out of 111 -> p = P(positive) approx = 1/10 > > P(0 out of 10 when p = 1/10) is not unlikely (in fact = 0.35). > > However, a Fisher exact test will give you a respectable P-value: > > > library(ctest) > > ?fisher.test > > fisher.test(matrix(c(102,9,10,0),nrow=2)) > [...] > p-value = 1 > alternative hypothesis: true odds ratio is not equal to 1 > 95 percent confidence interval: > 0.000000 6.088391 > > fisher.test(matrix(c(102,9,9,1),nrow=2)) > p-value = 0.5926 > > fisher.test(matrix(c(102,9,8,2),nrow=2)) > p-value = 0.2257 > > fisher.test(matrix(c(102,9,7,3),nrow=2)) > p-value = 0.0605 > > fisher.test(matrix(c(102,9,6,4),nrow=2)) > p-value = 0.01202 > > So there's a 95% CI (0,6.1) for the odds ratio which, for > identical probabilities of "+", is 1.0 hence well within the CI. > And, keeping the numbers for the larger sample fixed for > simplicity, you have to go quite a way with the smaller one to get > a result significant at 5%: > > (102,9):(7,3) -> P = 0.06 > (102,9):(6,4) -> P = 0.01 > > and, to have 80% power (0.8 probability of this event), the > probability of "+" in the second sample would have to be as > high as 0.41. > > Conclusion: your second sample size is quite inadequate except > to detect rather large differences between the true proportions > in the two cases! > > Best wishes, > Ted. > > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> > Fax-to-email: +44 (0)870 167 1972 > Date: 27-Nov-03 Time: 17:43:00 > ------------------------------ XFMail ------------------------------ >