I am having a slight problem with probabilities. To calculate the final probability of an event p(F), we can take the product of the chance that each independent event that makes p(F) will NOT occur. So... p(F) = 1- ( (1-p(A)) * (1-p(B)) * (1-p(C))...(1-p(x)) ) If the chance of an event within the product occurring remains the same, we can therefore raise this probability to a power of the number of times that event occurs. e.g. rolling a dice p(A) = 1/6 of getting a 1... p(F) = 1 - (1- (1/6))^z p(F) = 1 - (1-p(A))^z tells us the probabiltity of rolling a 1 'at least once' in z number of rolls. So then to R... if p(A) = 0.01; z = 4; p(F) = 0.039 obviously p(F) > p(A) however the problem arises when we use very small numbers e.g. p(B) = 1 * 10^-30 R understands this value However when you have 1-p(B) you get something very close to 1 as you expect...but R seems to think it is 1. So when I calculate p(F) = 1 - (1-p(B))^z = 1 to the power anything equals 1 so p(F) = 0 and not just close to zero BUT zero. It doesn't matter therefore if z = 1*10^1000, the answer is still zero !! Obviously therefore now p(F) < p(B) Is there any solution to my problem, e.g. - is it a problem with the sum (-) ? ie could I change the number of bits the number understands (however it seems strange that it can hold it as a value close to 0 but not close to 1) -or should I maybe use a function to calculate the exact answer ? -or something else Any help greatly appreciated Mark - _________________________________________________________________ [[alternative HTML version deleted]]
There are several arbitrary precision packages available: gmp (an interface to the GNU multi-precision library on CRAN) and bc (an R interface to the bc arbitrary precision calculator): http://r-bc.googlecode.com There are also packages providing R interfaces to two computer algebra systems and they both support not only arbitrary precision but also exact calculation: http://rsympy.googlecode.com http://ryacas.googlecode.com> library(bc) > 1 - (1-10^-75)^10[1] 0> bc("1 - (1-10^-75)^10")[1] ".0000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000" On Thu, May 21, 2009 at 10:15 AM, Mark Bilton <mcbilton at hotmail.com> wrote:> > I am having a slight problem with probabilities. > > To calculate the final probability of an event p(F), we can take the product of the chance that each independent event that makes p(F) will NOT occur. > So... > p(F) = 1- ( (1-p(A)) * (1-p(B)) * (1-p(C))...(1-p(x)) ) > > If the chance of an event within the product occurring remains the same, we can therefore raise this probability to a power of the number of times that event occurs. > e.g. rolling a dice p(A) = 1/6 of getting a 1... > p(F) = 1 - (1- (1/6))^z > p(F) = 1 - (1-p(A))^z tells us the probabiltity of rolling a 1 'at least once' in z number of rolls. > > So then to R... > > if p(A) = 0.01; z = 4; p(F) = 0.039 > > obviously p(F) > p(A) > > however the problem arises when we use very small numbers e.g. p(B) = 1 * 10^-30 > R understands this value > However when you have 1-p(B) you get something very close to 1 as you expect...but R seems to think it is 1. > So when I calculate p(F) = 1 - (1-p(B))^z = 1 to the power anything equals 1 so p(F) = 0 and not just close to zero BUT zero. > It doesn't matter therefore if z = 1*10^1000, the answer is still zero !! > > Obviously therefore now p(F) < p(B) > > Is there any solution to my problem, e.g. > - is it a problem with the sum (-) ? ie could I change the number of bits the number understands (however it seems strange that it can hold it as a value close to 0 but not close to 1) > -or should I maybe use a function to calculate the exact answer ? > -or something else > > Any help greatly appreciated > Mark > - > > > > > > _________________________________________________________________ > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 21-May-09 14:15:08, Mark Bilton wrote:> I am having a slight problem with probabilities. > > To calculate the final probability of an event p(F), we can take the > product of the chance that each independent event that makes p(F) will > NOT occur. > So... > p(F) = 1- ( (1-p(A)) * (1-p(B)) * (1-p(C))...(1-p(x)) ) > > If the chance of an event within the product occurring remains the > same, we can therefore raise this probability to a power of the number > of times that event occurs. > e.g. rolling a dice p(A) = 1/6 of getting a 1... > p(F) = 1 - (1- (1/6))^z > p(F) = 1 - (1-p(A))^z tells us the probabiltity of rolling a 1 'at > least once' in z number of rolls. > > So then to R... > > if p(A) = 0.01; z = 4; p(F) = 0.039 > > obviously p(F) > p(A) > > however the problem arises when we use very small numbers e.g. p(B) = 1 > * 10^-30 > R understands this value > However when you have 1-p(B) you get something very close to 1 as you > expect...but R seems to think it is 1. > So when I calculate p(F) = 1 - (1-p(B))^z = 1 to the power anything > equals 1 so p(F) = 0 and not just close to zero BUT zero. > It doesn't matter therefore if z = 1*10^1000, the answer is still zero > !! > > Obviously therefore now p(F) < p(B) > > Is there any solution to my problem, e.g. > - is it a problem with the sum (-) ? ie could I change the number of > bits the number understands (however it seems strange that it can hold > it as a value close to 0 but not close to 1) > -or should I maybe use a function to calculate the exact answer ? > -or something else > > Any help greatly appreciated > MarkThe problem is not with sum(), but with the fact that R adopts the exact value 1 for (1-x) if (on my machine) x < 10^(-16). This is to do with the number of binary digits used to store a number. Your 10^(-30) is much smaller than that! x <- 10^(-30) x # [1] 1e-30 y <- (1-x) ; 1-y # [1] 0 x <- 10^(-15) y <- (1-x) ; 1-y # [1] 9.992007e-16 x <- 10^(-16) y <- (1-x) ; 1-y # [1] 1.110223e-16 x <- 10^(-17) y <- (1-x) ; 1-y # [1] 0 Read ?.Machine and, in particular, about: double.eps: the smallest positive floating-point number 'x' such that '1 + x != 1'. It equals 'base^ulp.digits' if either 'base' is 2 or 'rounding' is 0; otherwise, it is '(base^ulp.digits) / 2'. Normally '2.220446e-16'. There is no cure for this in R, but you could try to work round it (depending on the details of your problem) by using an approximation to the value of the expression. For instance, if x is very small (say 10^(-20)), and n is not very large, then to first order (1 - x)^n = 1 - n*x or, if X is a vector of very small numbers, prod((1 - X)) = 1 - sum(X) Either of these may still result in 1 if what is being subtracted is less than .Machine$double.eps. However, in the particular type of application you describe, 1 - (1 - x)^n = n*x to first order 1 - prod((1 - X)) = sum(X) to first order You will just have to work out what will give you a sensible answer. Basically, you are trying to operate beyond the limits of precision of R, and so will need to re-cast your question in alternative terms which will yield an adequately precise result. Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 21-May-09 Time: 15:59:47 ------------------------------ XFMail ------------------------------
On 5/21/2009 10:15 AM, Mark Bilton wrote:> I am having a slight problem with probabilities. > > To calculate the final probability of an event p(F), we can take the product of the chance that each independent event that makes p(F) will NOT occur. > So... > p(F) = 1- ( (1-p(A)) * (1-p(B)) * (1-p(C))...(1-p(x)) ) > > If the chance of an event within the product occurring remains the same, we can therefore raise this probability to a power of the number of times that event occurs. > e.g. rolling a dice p(A) = 1/6 of getting a 1... > p(F) = 1 - (1- (1/6))^z > p(F) = 1 - (1-p(A))^z tells us the probabiltity of rolling a 1 'at least once' in z number of rolls. > > So then to R... > > if p(A) = 0.01; z = 4; p(F) = 0.039 > > obviously p(F) > p(A) > > however the problem arises when we use very small numbers e.g. p(B) = 1 * 10^-30 > R understands this value > However when you have 1-p(B) you get something very close to 1 as you expect...but R seems to think it is 1. > So when I calculate p(F) = 1 - (1-p(B))^z = 1 to the power anything equals 1 so p(F) = 0 and not just close to zero BUT zero. > It doesn't matter therefore if z = 1*10^1000, the answer is still zero !! > > Obviously therefore now p(F) < p(B) > > Is there any solution to my problem, e.g. > - is it a problem with the sum (-) ? ie could I change the number of bits the number understands (however it seems strange that it can hold it as a value close to 0 but not close to 1) > -or should I maybe use a function to calculate the exact answer ?The problem is that R maintains about 15-16 decimal places of accuracy, and to that accuracy, 1- 1.e-30 is equal to 1. You need to find a different formula, e.g. (1-x)^z = exp(z * log(1-x)) = exp(z * log1p(-x)) So if you have z = 1.e20, and p(A) = 1.e-30, then 1 - (1-1.e-30)^1.e20 can be calculated in R as 1 - exp(1.e20 * log1p(-1.e-30)) which works out to 1.e-10. Duncan Murdoch> -or something else > > Any help greatly appreciated > Mark > - > > > > > > _________________________________________________________________ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Mark Bilton > Sent: Thursday, May 21, 2009 7:15 AM > To: r-help at r-project.org > Subject: [R] Product of 1 - probabilities > > > I am having a slight problem with probabilities. > > To calculate the final probability of an event p(F), we can > take the product of the chance that each independent event > that makes p(F) will NOT occur. > So... > p(F) = 1- ( (1-p(A)) * (1-p(B)) * (1-p(C))...(1-p(x)) ) > > If the chance of an event within the product occurring > remains the same, we can therefore raise this probability to > a power of the number of times that event occurs. > e.g. rolling a dice p(A) = 1/6 of getting a 1... > p(F) = 1 - (1- (1/6))^z > p(F) = 1 - (1-p(A))^z tells us the probabiltity of rolling a > 1 'at least once' in z number of rolls. > > So then to R... > > if p(A) = 0.01; z = 4; p(F) = 0.039 > > obviously p(F) > p(A) > > however the problem arises when we use very small numbers > e.g. p(B) = 1 * 10^-30 > R understands this value > However when you have 1-p(B) you get something very close to > 1 as you expect...but R seems to think it is 1. > So when I calculate p(F) = 1 - (1-p(B))^z = 1 to the power > anything equals 1 so p(F) = 0 and not just close to zero BUT zero. > It doesn't matter therefore if z = 1*10^1000, the answer is > still zero !! > > Obviously therefore now p(F) < p(B) > > Is there any solution to my problem, e.g. > - is it a problem with the sum (-) ? ie could I change the > number of bits the number understands (however it seems > strange that it can hold it as a value close to 0 but not close to 1) > -or should I maybe use a function to calculate the exact answer ? > -or something else > > Any help greatly appreciated > MarkR uses double precision arithmetic, which gives you 52 binary digits of precision (about 17 decimal digits), so 1.0-10.0^-30==1.0 even though 10.0^-30 is greater than 0.0 (you need to get down to 10^-323 to make it 0.0 because there are 11 binary digits for the exponent used to scale the value). To work around this you can work on a log scale. The R functions log1p(x) and expm1(x) compute the much better approximations to the real functions log(1+x) and exp(x)-1, respectively, than do the R expressions log(1+x) and exp(x)-1 when x is very close to 0. E.g., > pB <- 1e-30 # 1*10^-30 > exp(100 * log1p(-pB)) [1] 1 > exp(100 * log1p(-pB))-1 [1] 0 > expm1(100 * log1p(-pB)) [1] -1e-28 You have another problem: you want z=10^1000, which is infinite in double precision arithmetic. You can take logs again to deal with this or do some more analysis. Also, if you are computing your basic pB and pF from R's p<dist> family of functions, they all have arguments to let you get the upper tail probabilities and output on a log scale. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com