Gonçalo Ferraz
2016-Jun-25 14:13 UTC
[R] strange behavior of lchoose in combinatorics problem
Hi, I am working on interactions between animals, studying whether animal 1 is attracted to animal 2 (or vice-versa). I looked for the two animals in N=2178 sampling occasions, finding animal 1 a total of N1=165 times, and animal 2 a total of N2=331 times. In J=97 occasions, I saw both animals at the same time. The more frequently I see the two animals in the same sampling occasion, the more I will believe that they are attracted to each other. Therefore, I want to calculate the probability of finding J<=97 when the two animals are moving around independently of each other. The higher this probability, the stronger the attraction. Following Veech (Journal of Biogeography 2014, 41: 1029-1035) I compute the log probability of obtaining a number n of encounters between animals as ?lpn? in the function lveech: lveech <- function(N,N1,N2,n) { a <- lchoose(N,n) b <- lchoose(N-n,N2-n) c <- lchoose(N-N2,N1-n) d <- lchoose(N,N2) e <- lchoose(N,N1) lpn <- (a+b+c)-(d+e) return(lpn) } I have tested this function with shorter, intuitive numbers, and it works as expected. I use log probabilities to stay away from intractable numbers. Next, I use function lveech to obtain a vector ?lpvec? that gives me all the log probabilities of getting n=0,1,2,?,97 simultaneous observations of the two animals: lpvec <- rep(NA,J+1) for(i in 1:(J+1)) lpvec[i] <- lveech(N,N1,N2,nseq[i]) PROBLEM: the sum of the probabilities in lpvec should be <=1, but it is not. The sum is something on the order of 1.48e-13. That happens whether I sum exp(lpvec) or use a function for summing probabilities in log-prob space. In most applications of lveech, the sum of the probabilities is <=1, as it should be, but occasionally I get this problem. Why should this be? Does anyone know of any issues with lchoose that could explain this? I want to solve this because if I round up the sum to 1 I will not be able to quantify the attraction between animals and compare it with the attractions between other pairs of animals in my data. Thank you so much for reading the long question. G. I have two binary vectors, v1 and v2, both with length N=2178. The sum of vector v1 is N1=165 and the sum of vector v2 is N2=331. I need to compute the probability that
Richard M. Heiberger
2016-Jun-25 23:26 UTC
[R] strange behavior of lchoose in combinatorics problem
Floating point numbers are rounded to 53 significant bits. When you use logs of integers you are using floating point numbers.> .3 + .6 == .9[1] FALSE> (.3 + .6) - .9[1] -1.110223e-16 See FAQ 7.31 for more discussion. On Sat, Jun 25, 2016 at 10:13 AM, Gon?alo Ferraz <gferraz29 at gmail.com> wrote:> Hi, > > I am working on interactions between animals, studying whether animal 1 is attracted to animal 2 (or vice-versa). I looked for the two animals in N=2178 sampling occasions, finding animal 1 a total of N1=165 times, and animal 2 a total of N2=331 times. In J=97 occasions, I saw both animals at the same time. > > The more frequently I see the two animals in the same sampling occasion, the more I will believe that they are attracted to each other. Therefore, I want to calculate the probability of finding J<=97 when the two animals are moving around independently of each other. The higher this probability, the stronger the attraction. > > Following Veech (Journal of Biogeography 2014, 41: 1029-1035) I compute the log probability of obtaining a number n of encounters between animals as ?lpn? in the function lveech: > > lveech <- function(N,N1,N2,n) { > a <- lchoose(N,n) > b <- lchoose(N-n,N2-n) > c <- lchoose(N-N2,N1-n) > d <- lchoose(N,N2) > e <- lchoose(N,N1) > lpn <- (a+b+c)-(d+e) > return(lpn) > } > > I have tested this function with shorter, intuitive numbers, and it works as expected. I use log probabilities to stay away from intractable numbers. > > Next, I use function lveech to obtain a vector ?lpvec? that gives me all the log probabilities of getting n=0,1,2,?,97 simultaneous observations of the two animals: > > lpvec <- rep(NA,J+1) > for(i in 1:(J+1)) lpvec[i] <- lveech(N,N1,N2,nseq[i]) > > PROBLEM: the sum of the probabilities in lpvec should be <=1, but it is not. The sum is something on the order of 1.48e-13. That happens whether I sum exp(lpvec) or use a function for summing probabilities in log-prob space. In most applications of lveech, the sum of the probabilities is <=1, as it should be, but occasionally I get this problem. Why should this be? Does anyone know of any issues with lchoose that could explain this? > > I want to solve this because if I round up the sum to 1 I will not be able to quantify the attraction between animals and compare it with the attractions between other pairs of animals in my data. > > Thank you so much for reading the long question. > > G. > > > > > > > > > I have two binary vectors, v1 and v2, both with length N=2178. The sum of vector v1 is N1=165 and the sum of vector v2 is N2=331. I need to compute the probability that > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
peter dalgaard
2016-Jun-26 06:39 UTC
[R] strange behavior of lchoose in combinatorics problem
> On 25 Jun 2016, at 16:13 , Gon?alo Ferraz <gferraz29 at gmail.com> wrote: > > PROBLEM: the sum of the probabilities in lpvec should be <=1, but it is not. The sum is something on the order of 1.48e-13.Um, in which sense is 1.48e-13 not <=1 ??? -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Stefan Evert
2016-Jun-26 12:35 UTC
[R] strange behavior of lchoose in combinatorics problem
Why do you want to do this? Why not simply use Fisher's exact test? N <- 2178 N1 <- 165 N2 <- 331 J <- 97 ct <- rbind(c(J, N1-J), c(N2-J, N-N1-N2+J)) fisher.test(ct) Background explanation: - Your formula computes the log hypergeometric probability for a contingency table as ct above, but with k instead of J. - It does so in an unnecessarily complicated way: three terms would be enough (cf. the equation at http://www.collocations.de/AM/section3.html; with C1=N1, C2=N-C1, R1=N2). - If you want to test for a positive association between the two animals, you should be adding up the probabilities for k >= J to obtain a p-value, rather than k <= J (what would this sum of probabilities tell you?). - lpvec doesn't contain probabilities, but log probabilities. What sense would there be in adding those up? In any case, you should obtain a negative value because all the individual logs are negative. Best, Stefan> On 25 Jun 2016, at 16:13, Gon?alo Ferraz <gferraz29 at gmail.com> wrote: > > I am working on interactions between animals, studying whether animal 1 is attracted to animal 2 (or vice-versa). I looked for the two animals in N=2178 sampling occasions, finding animal 1 a total of N1=165 times, and animal 2 a total of N2=331 times. In J=97 occasions, I saw both animals at the same time. > > The more frequently I see the two animals in the same sampling occasion, the more I will believe that they are attracted to each other. Therefore, I want to calculate the probability of finding J<=97 when the two animals are moving around independently of each other. The higher this probability, the stronger the attraction. > > Following Veech (Journal of Biogeography 2014, 41: 1029-1035) I compute the log probability of obtaining a number n of encounters between animals as ?lpn? in the function lveech: >
Gonçalo Ferraz
2016-Jun-27 04:35 UTC
[R] strange behavior of lchoose in combinatorics problem
Stefan, I am sorry that I wasn?t more careful writing my question. I misrepresented my problem when I wrote ?the sum of the probabilities in lpvec should be <=1, but it is not. The sum is something on the order of 1.48e-13.? It is absolutely right that lpvec doesn?t contain probabilities, it contains log-probabilities that are negative numbers. Also, clearly, the value of 1.48e-13 is <=1! What I meant is that I wanted to add probabilities in the log-probability space and that I expected a negative value as a result. Instead, I am getting a positive value. You are also right that if I wanted to test for a positive association I should add the probabilities of k>=J to obtain a p-value. This is clear to me, but I want a metric of the strength of association between animals that takes higher values when the association is stronger. A final result in the log-probability space is useful because I want to use the strength of association to build networks of individuals. Finally, your idea of using Fisher?s exact test is very appealing for the simplicity and availability of a working function in R, but I am having trouble interpreting the result, or the result is different from that of my formula. Here?s a simple example: Say N=2, N1=1, N2=1, and J=1. In this case, if the individuals are completely independent of each other, I expect to see them together with a probability of 0.5. That is the output from my function, in probability space. If I run Fisher?s exact test on the corresponding contingency table, however, I get a p-value of 1. This is what I did: fisher.test(rbind(c(J,N1-J),c(N2-J,N-N1-N2+J))) Why should this be, when the Fisher?s exact test is giving me the probability of obtaining the observed arrangement under the null hypothesis of no association between the animals. Am I interpreting the output incorrectly. Thank you for answer, it is great to find an even simpler way of addressing the problem. Best, Gon?alo> On Jun 26, 2016, at 9:35 AM, Stefan Evert <stefanML at collocations.de> wrote: > > Why do you want to do this? Why not simply use Fisher's exact test? > > N <- 2178 > N1 <- 165 > N2 <- 331 > J <- 97 > ct <- rbind(c(J, N1-J), c(N2-J, N-N1-N2+J)) > fisher.test(ct) > > Background explanation: > > - Your formula computes the log hypergeometric probability for a contingency table as ct above, but with k instead of J. > > - It does so in an unnecessarily complicated way: three terms would be enough (cf. the equation at http://www.collocations.de/AM/section3.html; with C1=N1, C2=N-C1, R1=N2). > > - If you want to test for a positive association between the two animals, you should be adding up the probabilities for k >= J to obtain a p-value, rather than k <= J (what would this sum of probabilities tell you?). > > - lpvec doesn't contain probabilities, but log probabilities. What sense would there be in adding those up? In any case, you should obtain a negative value because all the individual logs are negative. > > Best, > Stefan > > > >> On 25 Jun 2016, at 16:13, Gon?alo Ferraz <gferraz29 at gmail.com> wrote: >> >> I am working on interactions between animals, studying whether animal 1 is attracted to animal 2 (or vice-versa). I looked for the two animals in N=2178 sampling occasions, finding animal 1 a total of N1=165 times, and animal 2 a total of N2=331 times. In J=97 occasions, I saw both animals at the same time. >> >> The more frequently I see the two animals in the same sampling occasion, the more I will believe that they are attracted to each other. Therefore, I want to calculate the probability of finding J<=97 when the two animals are moving around independently of each other. The higher this probability, the stronger the attraction. >> >> Following Veech (Journal of Biogeography 2014, 41: 1029-1035) I compute the log probability of obtaining a number n of encounters between animals as ?lpn? in the function lveech: >> >