Hello I want to calculate natural logarithm of sum of combinations as follow: (R code) { com_sum=choose(2000000,482)*choose(1000000,118)+choose(2000000,483)*choose(1000000,117)+...+choose(2000000,i)*choose(1000000,600-i)+...+choose(2000000,600)*choose(1000000,0) #calculate the sum result=log(com_sum) #calculate the log of the sum } But every element of the com_sum is Inf, so I can't get the result Thank you in advance Yours sincerely ZhaoXing Department of Health Statistics West China School of Public Health Sichuan University No.17 Section 3, South Renmin Road Chengdu, Sichuan 610041 P.R.China __________________________________________________ ?????????????????????????????
On 07-Jan-11 16:20:59, zhaoxing731 wrote:> Hello > I want to calculate natural logarithm of sum of combinations > as follow: (R code){com_sum=choose(2000000,482)*choose(1000000,118)+ choose(2000000,483)*choose(1000000,117)+...+ choose(2000000,i)*choose(1000000,600-i)+...+ choose(2000000,600)*choose(1000000,0) #calculate the sum result=log(com_sum) #calculate the log of the sum }> But every element of the com_sum is Inf, so I can't get the > result. Thank you in advance > Yours sincerely > ZhaoXingYou may find you can usefully adapt the technique I have described in: http://www.zen89632.zen.co.uk/R/FisherTest/extreme_fisher.pdf A somewhat different but closely related problem. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 07-Jan-11 Time: 17:42:48 ------------------------------ XFMail ------------------------------
On Sat, Jan 08, 2011 at 12:20:59AM +0800, zhaoxing731 wrote:> Hello > > I want to calculate natural logarithm of sum of combinations as follow: (R code) > > { > com_sum=choose(2000000,482)*choose(1000000,118)+choose(2000000,483)*choose(1000000,117)+...+choose(2000000,i)*choose(1000000,600-i)+...+choose(2000000,600)*choose(1000000,0) #calculate the sum > result=log(com_sum) #calculate the log of the sum > } > > But every element of the com_sum is Inf, so I can't get the result > Thank you in advanceThe natural logarithm of choose() is lchoose(), so the natural logarithms of the products above are i <- 482:600 logx <- lchoose(2000000,i) + lchoose(1000000,600-i) maxlog <- max(logx) # [1] 5675.315 The sum of numbers, which have very different magnitudes, may be approximated by their maximum, so max(logx) is an approximation of the required logarithm. A more accurate calculation can be done, for example, as follows maxlog + log(sum(exp(logx - maxlog))) # [1] 5675.977 Petr Savicky.
On Sat, Jan 08, 2011 at 12:20:59AM +0800, zhaoxing731 wrote:> Hello > > I want to calculate natural logarithm of sum of combinations as follow: (R code) > > { > com_sum=choose(2000000,482)*choose(1000000,118)+choose(2000000,483)*choose(1000000,117)+...+choose(2000000,i)*choose(1000000,600-i)+...+choose(2000000,600)*choose(1000000,0) #calculate the sum > result=log(com_sum) #calculate the log of the sum > }Let me present a more detailed calculation. The natural logarithms of the products above are i <- 482:600 logx <- lchoose(2000000,i) + lchoose(1000000,600-i) The logarithm of the largest of the products is maxlog <- max(logx) maxlog # [1] 5675.315 The ratios of the products divided by the largest of them may be computed as exp(logx - maxlog) # [1] 1.000000e+00 4.885522e-01 2.361712e-01 1.129583e-01 5.345075e-02 # ... Only a few of the ratios are not negligible, so the sum of the products differs from the largest product by a small factor. This factor may be estimated as sum(exp(logx - maxlog)) # [1] 1.937370 The required logarithm differs from maxlog by the logarithm of this factor. So, the result is options(digits=15) maxlog + log(sum(exp(logx - maxlog))) # [1] 5675.97681976982 Computing the sum of products exactly using long integer arithmetic and computing its logarithm with large precision produced 5675.9768197698224041850302683544102271115821956713424401649... which matches well the result obtained using double precision in R. The above approach may be derived from the numerical part of the algorithm suggested by Ted Harding for computing extremely small p-values in the Fisher Exact test. There, we are computing a sum of small numbers, not large ones. However, the problem is similar, since we cannot represent the summands themselves, but we can represent their logarithms. In my opinion, P(a, b, c, d) is the largest of the summands or close to the largest, so we calculate the ratios of the summands to the largest of them or to a close approximation of the largest. Petr Savicky.
You could re-do part of your code to run with mpfr-class variables, and use this function: # mpfr choose(n,k) rmpfac<-function(n,k,prec=50) factorial(mpfr(n,prec))/factorial(mpfr(k,prec))/factorial(mpfr(n-k,prec)) Converting into and out of mpfr may not be worth it, but you will get all the precision you want without any nasty "Inf" results :-) Carl