k.jewell at campden.co.uk
2008-Feb-15 16:47 UTC
[R] Behaviour of integrate (was 'Poisson-lognormal probability calcul ations')
Hi again, Adding further information to my own query, this function gets to the core of the problem, which I think lies in the behaviour of 'integrate'. ------------------------------------- function (x, meanlog = 0, sdlog = 1, ...) { require(stats) integrand <- function(t, x, meanlog, sdlog) dpois(x,t)*dlnorm(t, meanlog, sdlog) mapply(function(x, meanlog, sdlog, ...) # (1/gamma(x+1))* integrate(function(t, x, meanlog, sdlog) # gamma(x+1)* integrand(t, x, meanlog, sdlog), lower = 0, upper = Inf, x = x, meanlog = meanlog, sdlog sdlog, ...)$value, x, meanlog, sdlog, ... ) } ---------------------------------------- Mathematically, the presence or not of the two commented lines should make no difference; they multiply the integrand by a constant (with respect to the integration), then divide the result by the same constant. In practice they make a big difference! I guess they're altering the behaviour of the 'integrate'. I'd have thought the presence of the lines would worsen the behaviour. Without the lines the integrand is reasonably small, the integral is < 1. With the lines the limit on the integral is x!, leading to "non-finite function values" for x much > 170, even if we use logs to get around the limit on gamma(x). In fact with the lines the plot of function(x) v. x looks reasonable (but I don't know if the values are correct!!), but without the lines it looks silly, I just don't believe it! I thought the problem might relate to the note in ?integrate "If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong.". I wouldn't really expect multiplication by a large constant to fix such errors (??), but the lognormal distribution is skew so it might be considered "approximately constant ... over nearly all its range". Even though ...> integrate(dlnorm, 0, Inf)1 with absolute error < 2.5e-07 ... suggests that this is not the source of the problem, I tried changing variables to integrate over a normally distributed variable: ---------------- function (x, meanlog = 0, sdlog = 1, ...) { require(stats) integrand <- function(t, x, meanlog, sdlog) dpois(x,exp(t))*dnorm(t, meanlog, sdlog) mapply(function(x, meanlog, sdlog, ...) # (1/gamma(x+1))* integrate(function(t, x, meanlog, sdlog) # gamma(x+1)* integrand(t, x, meanlog, sdlog), lower = -Inf, upper = Inf, x = x, meanlog = meanlog, sdlog sdlog, ...)$value, x, meanlog, sdlog, ... ) } ----------------------------- Still no better; with the constants the values look reasonably smooth (but are they correct??), without the constants the values are silly. I've tried reducing rel.tol and increasing subdivisions, they change the behaviour a little but don't "fix" it, and I still get the marked difference between the presence and absence of those lines (and I'm increasingly unsure whether either answer is correct!). I'm still trying. but I really think I'm going nowhere. Has anyone any ideas? Thanks in advance, Keith Jewell mailto:k.jewell at campden.co.uk telephone (direct) +44 (0)1386 842055> -----Original Message----- > From: Jewell, Keith > Sent: 15 February 2008 11:16 > To: 'r-help at r-project.org' > Subject: Poisson-lognormal probability calculations > > Hi, > > just for the record, although I don't think it's relevant (!) > ------------------------------------- > > sessionInfo() > R version 2.6.0 (2007-10-03) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > Kingdom.1252;LC_MONETARY=English_United > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats4 splines stats graphics grDevices utils > datasets > methods base > > other attached packages: > [1] VGAM_0.7-5 xlsReadWrite_1.3.2 > -------------------------------- > > I'm having some problems with a Poisson-lognormal density (mass?) > function. > > VGAM has the dpolono function, but that doesn't work for x-values over > 170, and I need to go to *much* bigger numbers. It fails first because > of gamma overflow, then because of non-finite integrand. > --------------------- > > VGAM::dpolono(170) > [1] 4.808781e-09 > > VGAM::dpolono(171) > [1] 0 > Warning message: > In VGAM::dpolono(171) : value out of range in 'gammafn' > > VGAM::dpolono(172) > [1] 0 > Warning message: > In VGAM::dpolono(172) : value out of range in 'gammafn' > > VGAM::dpolono(173) > [1] 0 > Warning message: > In VGAM::dpolono(173) : value out of range in 'gammafn' > > VGAM::dpolono(174) > Error in integrate(f = integrand, lower = -Inf, upper = Inf, x = x[i], > : > non-finite function value > ------------- > > I tidied up a little (to my eyes only, no offence intended to the > estimable authors of VGAM) and avoided the gamma overflow by using logs > - OK so far, it agrees almost perfectly with VGAM::dpolono for x up to > 170, and now extends the range up to 173 (wow!!). > ------------- > dApolono <- > function (x, meanlog = 0, sdlog = 1, ...) { > require(stats) > integrand <- function(t, x, meanlog, sdlog) exp(-t+x*log(t)- > log(sdlog*t*sqrt(2*pi))-0.5*((log(t)-meanlog)/sdlog)^2) > mapply(function(x, meanlog, sdlog, ...){ > temp <- try( > integrate(f = integrand, lower = 0, upper = Inf, x = x, > meanlog = meanlog, sdlog = sdlog, ...) > ) > ifelse(inherits(temp, "try-error"), NA, exp(log(temp$value)- > lgamma(x+1))) > } > , x, meanlog, sdlog, ... > ) > } > plot(log(dApolono(0:173))) > dApolono(174) > ----------------------------- > > Addressing the non-finite integrand, I noticed that the gamma(x+1) > divisor was outside the integrand so that as x gets bigger the > integrand gets bigger, only to be finally divided by an increasing > gamma(x+1). I reasoned that putting the divisor inside the integral > might incur a substantial performance hit, but would keep the integrand > at reasonable values. > > -------------------------------------- > dApolono <- > function (x, meanlog = 0, sdlog = 1, ...) { > require(stats) > integrand <- function(t, x, meanlog, sdlog) exp(-t+x*log(t)- > log(sdlog*t*sqrt(2*pi))-0.5*((log(t)-meanlog)/sdlog)^2-lgamma(x+1)) > mapply(function(x, meanlog, sdlog, ...){ > temp <- try( > integrate(f = integrand, lower = 0, upper = Inf, x = x, > meanlog = meanlog, sdlog = sdlog, ...) > ) > ifelse(inherits(temp, "try-error"), NA, temp$value) > } > , x, meanlog, sdlog, ... > ) > } > plot(log(dApolono(0:173))) > dApolono(174) > dApolono(1E3) > ------------------------- > > This avoids the non-finite integrand and gives me answers at much > higher x, but now at least some of the answers are wrong (not to say > silly), even in the range where the other versions worked. I've tried > other variations including changing the variable of integration to > log(t) and integrating dpois()*dlnorm(), but I can't fix it; if the > factorial (=gamma) is inside the integrand I get silly answers, if it's > outside I get non-finite integrand. > > I'm tearing my hair. Can anyone suggest where I may be going wrong? Any > suggestions at all will be appreciated. > > Thanks in advance, > > Keith Jewell > mailto:k.jewell at campden.co.uk telephone (direct) +44 (0)1386 842055 >_____________________________________________________________________ The information in this document and attachments is given after the exercise of all reasonable care and skill in its compilation, preparation and issue, but is provided without liability in its application or use. It may contain privileged information that is exempt from disclosure by law and may be confidential. If you are not the intended recipient you must not copy, distribute or take any action in reliance on it. If you have received this document in error please notify us and delete this message from your system immediately. Campden & Chorleywood Food Research Association Group; Campden & Chorleywood Food Research Association (company limited by guarantee),registered number 510618); CCFRA Group Services Ltd. (registered number 3841905); and CCFRA Technology Ltd (registered number 3836922), all registered in England and Wales with the registered office at Station Road, Chipping Campden, Gloucestershire, GL55 6LD. The Group may monitor e-mail traffic data and also the content of e-mail for the purposes of security and staff training. Any advice given is subject to our normal terms and conditions of trading, a copy of which is available on request. ________________________________________________________________________ This e-mail has been scanned for all viruses by MessageL...{{dropped:2}}