On Thu, Aug 16, 2012 at 01:41:17PM -0400, Jie wrote:> Dear All,
>
> I am evaluating the value of loglikelihood and it ends up with the sum of
> tiny numbers.
> Below is an example: suppose I would like to calculate sum_i (log (sum_j x
> [i, j] )), the index of log (x) is in the range, say (-2000, 0). I am aware
> that exp(-744.5) will be expressed as 0 in 32 bit R and exp
> Is there a way to improve the result?
>
> R example:
> powd <- sample(-2000:0, 100, replace=T) # the power of x [i, j]
> x <- matrix(exp(powd),10) # the value of x
> sum(log(rowSums(x))) # sum
Hi.
The numbers in a row of the matrix "x" are not only very small, but
also very different from each other. In this situation, their maximum
may be a good approximation of their sum. For example
2^-1000 + 2^-500 + 2^-100 \approx 2^-100
since
(2^-1000 + 2^-500 + 2^-100)/2^-100
is very close to 1.
If the numbers are not so different, then the following formula
for a vector "x", which may be a single row of a matrix, may be
helpful
log(sum(exp(x))) = max(x) + log(sum(exp(x - max(x))))
Note that exp() is not applied to the original numbers, but to their
differences and the main term is exp(0) due to the maximum element
of x. So, at least the main term is computed without underflow.
The accuracy may be further improved by eliminating exp(0) from the
sum in the right hand side and using function log1p(x).
set.seed(1234567)
x <- - runif(10)
A <- log(sum(exp(x)))
y <- x[-which.max(x)]
B <- max(x) + log1p(sum(exp(y - max(x))))
A
[1] 1.771461
abs(A - B)
[1] 2.220446e-16
See also
http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy:computing_using_logarithms
Hope this helps.
Petr Savicky.