Hi, I need to run a Fisher's exact test on thousands of 2x2 contingency tables, and repeat this process several thousand times (this is a part of the permutation test for a genome-wide association study). How can I run this process most efficiently? Is there any way to optimize R code? I have my data in a 2x2xN array (N ~ 5 K; eventually N will be ~ 500 K), and use apply inside the loop:> for (iter in 1:1000) {apply(data,3,fisherPval) } fisherPval <- function(x) { fisher.test(x)$p.value } Right now, it takes about 30 sec per iteration on an Intel Xeon 3.06GHz processor. Thanks in advance. -- Anna Pluzhnikov, PhD Section of Genetic Medicine Department of Medicine The University of Chicago ------------------------------------------------- This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you.
Prof Brian Ripley
2005-Nov-18 17:20 UTC
[R] Millions of calls to fisher.test (was (no subject))
Setting conf.int=FALSE will help. Looking at the code of fisher.test and extracting just the bit you need will help more. Do you actually need a two-sided test? Fisher did not, and if not, the computations can be reduced to a call to phyper which is vectorized. On Fri, 18 Nov 2005, Anna Pluzhnikov wrote:> Hi, > I need to run a Fisher's exact test on thousands of 2x2 contingency tables, and > repeat this process several thousand times (this is a part of the permutation > test for a genome-wide association study). > > How can I run this process most efficiently? Is there any way to optimize R code? > > I have my data in a 2x2xN array (N ~ 5 K; eventually N will be ~ 500 K), and use > apply inside the loop: >> for (iter in 1:1000) { > apply(data,3,fisherPval) > }Why are you calling the same thing 1000 times?> fisherPval <- function(x) { > fisher.test(x)$p.value > } > Right now, it takes about 30 sec per iteration on an Intel Xeon 3.06GHz processor.[Disclaimer etc removed]> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlPLEASE do, and use a meaningful subject line. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Anna Pluzhnikov <apluzhni at bsd.uchicago.edu> writes:> Hi, > I need to run a Fisher's exact test on thousands of 2x2 contingency tables, and > repeat this process several thousand times (this is a part of the permutation > test for a genome-wide association study). > > How can I run this process most efficiently? Is there any way to optimize R code? > > I have my data in a 2x2xN array (N ~ 5 K; eventually N will be ~ 500 K), and use > apply inside the loop: > > for (iter in 1:1000) { > apply(data,3,fisherPval) > } > fisherPval <- function(x) { > fisher.test(x)$p.value > } > Right now, it takes about 30 sec per iteration on an Intel Xeon 3.06GHz processor. > > Thanks in advance.The appropriate application of phyper() should save you quite a bit, especially if you're pragmatic and just use the two one-sided tests rather than the two-sided one which is a bit harder to compute. (Notice that phyper() is vectorized over all its arguments). As in:> M <- array(rpois(2*2*5000,lambda=20),c(2,2,500000)) > x <- M[1,1,] > m <- M[1,1,]+M[2,1,] > n <- M[1,2,]+M[2,2,] > k <- M[1,1,]+M[1,2,] > system.time(pleft<-phyper(x,m,n,k))[1] 2.16 0.01 2.16 0.00 0.00> sum(pleft < 0.05)[1] 16400> sum(pleft < 0.05)/500000[1] 0.0328 -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907