In an effort to select the most appropriate number of clusters in a mixture analysis I am comparing the expected and actual membership of individuals in various clusters using the Fisher?s exact test. I aim for the model with the lowest possible p-value, but I frequently get p-values below 2.2e-16 and therefore does not get exact p-values with standard Fisher?s exact tests in R. Does anybody know if there is a version of Fisher?s exact test in any package which can handle lower probabilities, or have other suggestions as to how I can compare the probabilities? I am for instance comparing the following two: dat2<-matrix(c(29,0,29,0,12,0,18,0,0,29,0,16,0,19), nrow=2) fisher.test(dat2, workspace=30000000) dat3<-matrix(c(29,0,0,29,0,0,12,0,0,17,0,1,0,29,0,0,15,1,0,0,19), nrow=3) fisher.test(dat3, workspace=30000000) Which both result in p-value < 2.2e-16 Kind regards, S?ren
S??ren Faurby <soren.faurby <at> biology.au.dk> writes:> > In an effort to select the most appropriate number of clusters in a > mixture analysis I am comparing the expected and actual membership of > individuals in various clusters using the Fisher?s exact test. I aim > for the model with the lowest possible p-value, but I frequently get > p-values below 2.2e-16 and therefore does not get exact p-values with > standard Fisher?s exact tests in R. >The p<2.2e-16 is a printing issue, not a precision issue.> ff = fisher.test(dat3, workspace=30000000) > ffFisher's Exact Test for Count Data data: dat3 p-value < 2.2e-16 alternative hypothesis: two.sided> str(ff)List of 4 $ p.value : num 5.88e-58 $ alternative: chr "two.sided" $ method : chr "Fisher's Exact Test for Count Data" $ data.name : chr "dat3" - attr(*, "class")= chr "htest" So just use ff$p.value
The aylmer package has some functionality in this regard which you may find useful. In particular, you can use good() to get a feel for the number of tableaux that are consistent with the specified marginal totals: > good(dat2) [1] 42285210 > good(dat3) [1] 2.756286e+12 > HTH rksh S?ren Faurby wrote:> In an effort to select the most appropriate number of clusters in a > mixture analysis I am comparing the expected and actual membership of > individuals in various clusters using the Fisher?s exact test. I aim > for the model with the lowest possible p-value, but I frequently get > p-values below 2.2e-16 and therefore does not get exact p-values with > standard Fisher?s exact tests in R. > > Does anybody know if there is a version of Fisher?s exact test in > any package which can handle lower probabilities, or have other > suggestions as to how I can compare the probabilities? > > I am for instance comparing the following two: > > dat2<-matrix(c(29,0,29,0,12,0,18,0,0,29,0,16,0,19), nrow=2) > fisher.test(dat2, workspace=30000000) > > dat3<-matrix(c(29,0,0,29,0,0,12,0,0,17,0,1,0,29,0,0,15,1,0,0,19), > nrow=3) > fisher.test(dat3, workspace=30000000) > > Which both result in p-value < 2.2e-16 > > Kind regards, S?ren > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877
S?ren Faurby wrote:> In an effort to select the most appropriate number of clusters in a > mixture analysis I am comparing the expected and actual membership of > individuals in various clusters using the Fisher?s exact test. I aim > for the model with the lowest possible p-value, but I frequently get > p-values below 2.2e-16 and therefore does not get exact p-values with > standard Fisher?s exact tests in R. > > Does anybody know if there is a version of Fisher?s exact test in > any package which can handle lower probabilities, or have other > suggestions as to how I can compare the probabilities? > > I am for instance comparing the following two: > > dat2<-matrix(c(29,0,29,0,12,0,18,0,0,29,0,16,0,19), nrow=2) > fisher.test(dat2, workspace=30000000) > > dat3<-matrix(c(29,0,0,29,0,0,12,0,0,17,0,1,0,29,0,0,15,1,0,0,19), > nrow=3) > fisher.test(dat3, workspace=30000000) > > Which both result in p-value < 2.2e-16 > > Kind regards, S?renThe direct answer is that it is primarily an issue of printing conventions:> fisher.test(dat2, workspace=30000000)$p.value[1] 5.384278e-44> fisher.test(dat3, workspace=30000000)$p.value[1] 5.883133e-58 However, I'm not sure (a) what is the influence of underflow in the calculation of such tiny p-values, or (b) whether the p-value is a sensible metric for comparing clustering models at all. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
I know that you didn't ask for this but to me this seems to be a very dodgy method to select a "best number of clusters" with no proper basis at all. All of these tests are data dependent, so the p-values cannot be interpreted in the usual way. It is actually not clear how they can be interpreted, and the freedom in the data to find a clustering depends on the number of clusters, so there is no reason to expect that comparing p-values for different numbers tells you anything meaningful. Do you really think that it is an informative difference if one clustering gives you p=10^{-58} and another one 10^{-30}? Christian On Thu, 17 Dec 2009, S??ren Faurby wrote:> In an effort to select the most appropriate number of clusters in a > mixture analysis I am comparing the expected and actual membership of > individuals in various clusters using the Fisher?s exact test. I aim > for the model with the lowest possible p-value, but I frequently get > p-values below 2.2e-16 and therefore does not get exact p-values with > standard Fisher?s exact tests in R. > > Does anybody know if there is a version of Fisher?s exact test in > any package which can handle lower probabilities, or have other suggestions > as to how I can compare the probabilities? > > I am for instance comparing the following two: > > dat2<-matrix(c(29,0,29,0,12,0,18,0,0,29,0,16,0,19), nrow=2) > fisher.test(dat2, workspace=30000000) > > dat3<-matrix(c(29,0,0,29,0,0,12,0,0,17,0,1,0,29,0,0,15,1,0,0,19), > nrow=3) > fisher.test(dat3, workspace=30000000) > > Which both result in p-value < 2.2e-16 > > Kind regards, S?ren > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.*** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche