JiangMei
2012-Dec-03 21:22 UTC
[R] discrepancy in fisher exact test between R and wiki formula
Hi All. Sorry to bother you. I have a question about fisher exact test. I counted the presence of gene mutation in two groups of samples. My data is as follows Presence Absence GroupA 4 6 GroupB 5 11 When using the formula of fisher exact test provided by wiki (http://en.wikipedia.org/wiki/Fisher%27s_exact_test), the p-value is 0.29. But when calculated by R, the p-value is 0.69. My code is shown below counts<-c(4,5,6,11) data<-matrix(counts,nrow=2) fisher.test(data) Why did I get two different numbers? Is there anything wrong with my R codes? Wish your help! Thanks very much! I really appreciate it. [[alternative HTML version deleted]]
(Ted Harding)
2012-Dec-03 22:24 UTC
[R] discrepancy in fisher exact test between R and wiki formula
On 03-Dec-2012 21:22:28 JiangMei wrote:> Hi All. Sorry to bother you. I have a question about fisher exact test. > > I counted the presence of gene mutation in two groups of samples. > My data is as follows > Presence Absence > GroupA 4 6 > GroupB 5 11 > > When using the formula of fisher exact test provided by wiki > (http://en.wikipedia.org/wiki/Fisher%27s_exact_test), the p-value is 0.29. > > But when calculated by R, the p-value is 0.69. My code is shown below > counts<-c(4,5,6,11) > data<-matrix(counts,nrow=2) > fisher.test(data) > > Why did I get two different numbers? Is there anything wrong with my R codes? > > Wish your help! Thanks very much! I really appreciate it.The reason is that the formula given in Wikipedia is for one particlar set of values (a,b,c,d). In your case, a=4, b=6, c=5, d=11 and the Wikipedia formula for p gives the probability of (a,b,c,d) = (4,6,5,11). However, this is not the P-value for the test. For a 3-sided alternative (see ?fisher.test ) the P-value is the sum of all such probabilities for values of (a,b,c,d) such that a+b = 10, c+d = 16, a+c = 9, b+d = 17 AND the probability p is less than or equal to the probability of (4,6,5,11). So it includes the case that has been observed and (in general) others, so will be greater (0.69) than the value (0.29) given by the formula. The default alternative for R's fisher.test() is "two-sided". If you look at ?fisher.test() you will see: Two-sided tests are based on the probabilities of the tables, and take as 'more extreme' all tables with probabilities less than or equal to that of the observed table, the p-value being the sum of such probabilities. I hope this helps. Ted. ------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at wlandres.net> Date: 03-Dec-2012 Time: 22:24:00 This message was sent by XFMail