Deepankar Basu
2007-May-24 22:13 UTC
[R] Problem with numerical integration and optimization with BFGS
Hi R users, I have a couple of questions about some problems that I am facing with regard to numerical integration and optimization of likelihood functions. Let me provide a little background information: I am trying to do maximum likelihood estimation of an econometric model that I have developed recently. I estimate the parameters of the model using the monthly US unemployment rate series obtained from the Federal Reserve Bank of St. Louis. (The data is freely available from their web-based database called FRED-II). For my model, the likelihood function for each observation is the sum of three integrals. The integrand in each of these integrals is of the following form: A*exp(B+C*x-D*x^2) where A, B, C and D are constants, exp() is the exponential function and x is the variable of integration. The constants A and D are always positive; B is always negative, while there is no a priori knowledge about the sign of C. All the constants are finite. Of the three integrals, one has finite limits while the other two have the following limits: lower = -Inf upper = some finite number (details can be found in the code below) My problem is the following: when I try to maximize the log-likelihood function using "optim" with method "BFGS", I get the following error message (about the second integral):> out <- optim(alpha.start, LLK, gr=NULL, method="BFGS", y=urate$y)Error in integrate(f3, lower = -Inf, upper = upr2) : the integral is probably divergent Since I know that all the three integrals are convergent, I do not understand why I am getting this error message. My first question: can someone explain what mistake I am making? What is even more intriguing is that when I use the default method (Nelder-Mead) in "optim" instead of BFGS, I do not get any such error message. Since both methods (Nelder-Mead and BFGS) will need to evaluate the integrals, my second question is: why this difference? Below, I am providing the code that I use. Any help will be greatly appreciated. Deepankar ************ CODE START ******************* ############################# # COMPUTING THE LOGLIKELIHOOD # USING NUMERICAL INTEGRALS ############################# LLK <- function(alpha, y) { n <- length(y) lglik <- numeric(n) # TO BE SUMMED LATER TO GET THE LOGLIKELIHOOD lambda <- numeric(n-1) # GENERATING *lstar* for (i in 1:(n-1)) { # TO USE IN THE lambda[i] <- y[i+1]/y[i] # RE-PARAMETRIZATION BELOW } lstar <- (min(lambda)-0.01) # NOTE RE-PARAMETRIZATION # THESE RESTRICTIONS EMERGE FROM THE MODEL muep <- alpha[1] # NO RESTRICTION sigep <- 0.01 + exp(alpha[2]) # greater than 0.01 sigeta <- 0.01 + exp(alpha[3]) # greater than 0.01 rho2 <- 0.8*sin(alpha[4]) # between -0.8 and 0.8 rho1 <- lstar*abs(alpha[5])/(1+abs(alpha[5])) # between 0 and lstar delta <- 0.01 + exp(alpha[6]) # greater than 0.01 ########################################## # THE THREE FUNCTIONS TO INTEGRATE # FOR COMPUTING THE LOGLIKELIHOOD ########################################## denom <- 2*pi*sigep*sigeta*(sqrt(1-rho2^2)) # A CONSTANT TO BE USED # FOR DEFINING THE # THREE FUNCTIONS f1 <- function(z1) { # FIRST FUNCTION b11 <- ((z1-muep)^2)/((-2)*(1-rho2^2)*(sigep^2)) b12 <- (rho2*(z1-muep)*(y[i]-y[i-1]+delta))/((1-rho2^2)*sigep*sigeta) b13 <- ((y[i]-y[i-1]+delta)^2)/((-2)*(1-rho2^2)*(sigeta^2)) return((exp(b11+b12+b13))/denom) } f3 <- function(z3) { # SECOND FUNCTION b31 <- ((y[i]-rho1*y[i-1]-muep)^2)/((-2)*(1-rho2^2)*(sigep^2)) b32 <- (rho2*(y[i]-rho1*y[i-1]-muep)*z3)/((1-rho2^2)*sigep*sigeta) b33 <- ((z3)^2)/((-2)*(1-rho2^2)*(sigeta^2)) return((exp(b31+b32+b33))/denom) } f5 <- function(z5) { # THIRD FUNCTION b51 <- ((-y[i]+rho1*y[i-1]-muep)^2)/((-2)*(1-rho2^2)*sigep^2) b52 <- (rho2*(-y[i]+rho1*y[i-1]-muep)*(z5))/((1-rho2^2)*sigep*sigeta) b53 <- ((z5)^2)/((-2)*(1-rho2^2)*(sigeta^2)) return((exp(b51+b52+b53))/denom) } for (i in 2:n) { # START FOR LOOP upr1 <- (y[i]-rho1*y[i-1]) upr2 <- (y[i]-y[i-1]+delta) # INTEGRATING THE THREE FUNCTIONS out1 <- integrate(f1, lower = (-1)*upr1, upper = upr1) out3 <- integrate(f3, lower = -Inf, upper = upr2) out5 <- integrate(f5, lower= -Inf, upper = upr2) pdf <- (out1$val + out3$val + out5$val) lglik[i] <- log(pdf) # LOGLIKELIHOOD FOR OBSERVATION i } # END FOR LOOP return(-sum(lglik)) # RETURNING NEGATIVE OF THE LOGLIKELIHOOD # BECAUSE optim DOES MINIMIZATION BY DEFAULT } ***************** CODE ENDS ********************************* Then I use:> urate <- read.table("~/Desktop/UNRATE1.txt", header=TRUE) # DATA > alpha.start <- c(0.5, -1, -1, 0, 1, -1) # STARTING VALUES > out <- optim(alpha.start, LLK, gr=NULL, y=urate$y) # THIS GIVES NOERROR or> out <- optim(alpha.start, LLK, gr=NULL, method="BFGS", y=urate$y)Error in integrate(f3, lower = -Inf, upper = upr2) : the integral is probably divergent
Ben Bolker
2007-May-24 22:44 UTC
[R] Problem with numerical integration and optimization with BFGS
Deepankar Basu <basu.15 <at> osu.edu> writes:> > > For my model, the likelihood function for each observation is the sum of > three integrals. The integrand in each of these integrals is of the > following form: > > A*exp(B+C*x-D*x^2) >(where D is positive) Being very lazy, I tried Mathematica's online integrator (http://integrals.wolfram.com/index.jsp), which says that 2 Integrate[Exp[b + c x - d x ], x] == 2 b + c /(4 d) -c + 2 d x E Sqrt[Pi] Erf[----------] 2 Sqrt[d] -------------------------------------- 2 Sqrt[d] R knows "Erf" (the error function) as pnorm -- can you use this to avoid numerical integration entirely ??
Prof Brian Ripley
2007-May-25 03:22 UTC
[R] Problem with numerical integration and optimization with BFGS
You are trying to use a derivative-based optimization method without supplying derivatives. This will use numerical approoximations to the derivatives, and your objective function will not be suitable as it is internally using adaptive numerical quadrature and hence is probably not close enough to a differentiable function (it may well have steps). I believe you can integrate analytically (the answer will involve pnorm), and that you can also find analytical derivatives. Using (each of) numerical optimization and integration is a craft, and it seems you need to know more about it. The references on ?optim are too advanced I guess, so you could start with Chapter 16 of MASS and its references. On Thu, 24 May 2007, Deepankar Basu wrote:> Hi R users, > > I have a couple of questions about some problems that I am facing with > regard to numerical integration and optimization of likelihood > functions. Let me provide a little background information: I am trying > to do maximum likelihood estimation of an econometric model that I have > developed recently. I estimate the parameters of the model using the > monthly US unemployment rate series obtained from the Federal Reserve > Bank of St. Louis. (The data is freely available from their web-based > database called FRED-II). > > For my model, the likelihood function for each observation is the sum of > three integrals. The integrand in each of these integrals is of the > following form: > > A*exp(B+C*x-D*x^2) > > where A, B, C and D are constants, exp() is the exponential function and > x is the variable of integration. The constants A and D are always > positive; B is always negative, while there is no a priori knowledge > about the sign of C. All the constants are finite. > > Of the three integrals, one has finite limits while the other two have > the following limits: > > lower = -Inf > upper = some finite number (details can be found in the code below)Try integrating that analytically by change of variable to a normal CDF.> My problem is the following: when I try to maximize the log-likelihood > function using "optim" with method "BFGS", I get the following error > message (about the second integral): > >> out <- optim(alpha.start, LLK, gr=NULL, method="BFGS", y=urate$y) > Error in integrate(f3, lower = -Inf, upper = upr2) : > the integral is probably divergent > > Since I know that all the three integrals are convergent, I do not > understand why I am getting this error message. My first question: can > someone explain what mistake I am making? > > What is even more intriguing is that when I use the default method > (Nelder-Mead) in "optim" instead of BFGS, I do not get any such error > message. Since both methods (Nelder-Mead and BFGS) will need to evaluate > the integrals, my second question is: why this difference? > > Below, I am providing the code that I use. Any help will be greatly > appreciated. > > > Deepankar > > > ************ CODE START ******************* > > > > ############################# > # COMPUTING THE LOGLIKELIHOOD > # USING NUMERICAL INTEGRALS > ############################# > > LLK <- function(alpha, y) { > > n <- length(y) > lglik <- numeric(n) # TO BE SUMMED LATER TO GET THE LOGLIKELIHOOD > > lambda <- numeric(n-1) # GENERATING *lstar* > for (i in 1:(n-1)) { # TO USE IN THE > lambda[i] <- y[i+1]/y[i] # RE-PARAMETRIZATION BELOW > } > lstar <- (min(lambda)-0.01) > > > # NOTE RE-PARAMETRIZATION > # THESE RESTRICTIONS EMERGE FROM THE MODEL > > muep <- alpha[1] # NO RESTRICTION > sigep <- 0.01 + exp(alpha[2]) # greater than > 0.01 > sigeta <- 0.01 + exp(alpha[3]) # greater than > 0.01 > rho2 <- 0.8*sin(alpha[4]) # between -0.8 > and 0.8 > rho1 <- lstar*abs(alpha[5])/(1+abs(alpha[5])) # between 0 and > lstar > delta <- 0.01 + exp(alpha[6]) # greater than > 0.01 > > > ########################################## > # THE THREE FUNCTIONS TO INTEGRATE > # FOR COMPUTING THE LOGLIKELIHOOD > ########################################## > > denom <- 2*pi*sigep*sigeta*(sqrt(1-rho2^2)) # A CONSTANT TO BE USED > # FOR DEFINING THE > # THREE FUNCTIONS > > > f1 <- function(z1) { # FIRST FUNCTION > > b11 <- ((z1-muep)^2)/((-2)*(1-rho2^2)*(sigep^2)) > b12 <- > (rho2*(z1-muep)*(y[i]-y[i-1]+delta))/((1-rho2^2)*sigep*sigeta) > b13 <- ((y[i]-y[i-1]+delta)^2)/((-2)*(1-rho2^2)*(sigeta^2)) > > return((exp(b11+b12+b13))/denom) > } > > f3 <- function(z3) { # SECOND FUNCTION > > b31 <- ((y[i]-rho1*y[i-1]-muep)^2)/((-2)*(1-rho2^2)*(sigep^2)) > b32 <- > (rho2*(y[i]-rho1*y[i-1]-muep)*z3)/((1-rho2^2)*sigep*sigeta) > b33 <- ((z3)^2)/((-2)*(1-rho2^2)*(sigeta^2)) > > return((exp(b31+b32+b33))/denom) > } > > f5 <- function(z5) { # THIRD FUNCTION > > b51 <- ((-y[i]+rho1*y[i-1]-muep)^2)/((-2)*(1-rho2^2)*sigep^2) > b52 <- > (rho2*(-y[i]+rho1*y[i-1]-muep)*(z5))/((1-rho2^2)*sigep*sigeta) > b53 <- ((z5)^2)/((-2)*(1-rho2^2)*(sigeta^2)) > > return((exp(b51+b52+b53))/denom) > } > > > for (i in 2:n) { # START FOR LOOP > > upr1 <- (y[i]-rho1*y[i-1]) > upr2 <- (y[i]-y[i-1]+delta) > > # INTEGRATING THE THREE FUNCTIONS > out1 <- integrate(f1, lower = (-1)*upr1, upper = upr1) > out3 <- integrate(f3, lower = -Inf, upper = upr2) > out5 <- integrate(f5, lower= -Inf, upper = upr2) > > pdf <- (out1$val + out3$val + out5$val) > > lglik[i] <- log(pdf) # LOGLIKELIHOOD FOR OBSERVATION i > > } # END FOR LOOP > > return(-sum(lglik)) # RETURNING NEGATIVE OF THE LOGLIKELIHOOD > # BECAUSE optim DOES MINIMIZATION BY DEFAULT > } > > ***************** CODE ENDS ********************************* > > Then I use: > >> urate <- read.table("~/Desktop/UNRATE1.txt", header=TRUE) # DATA >> alpha.start <- c(0.5, -1, -1, 0, 1, -1) # STARTING VALUES >> out <- optim(alpha.start, LLK, gr=NULL, y=urate$y) # THIS GIVES NO > ERROR > > or > >> out <- optim(alpha.start, LLK, gr=NULL, method="BFGS", y=urate$y) > Error in integrate(f3, lower = -Inf, upper = upr2) : > the integral is probably divergent > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Deepankar Basu
2007-May-25 04:01 UTC
[R] Problem with numerical integration and optimization with BFGS
Prof. Ripley, The code that I provided with my question of course does not contain code for the derivatives; but I am supplying analytical derivatives in my full program. I did not include that code with my question because that would have added about 200 more lines of code without adding any new information relevant for my question. The problem that I had pointed to occurs whether I provide analytical derivatives or not to the optimization routine. And the problem was that when I use the "BFGS" method in optim, I get an error message saying that the integrals are probably divergent; I know, on the other hand, that the integrals are convergent. The same problem does not arise when I instead use the Nelder-Mead method in optim. Your suggestion that the expression can be analytically integrated (which will involve "pnorm") might be correct though I do not see how to do that. The integrands are the bivariate normal density functions with one variable replaced by known quantities while I integrate over the second. For instance, the first integral is as follows: the integrand is the bivariate normal density function (with general covariance matrix) where the second variable has been replaced by y[i] - rho1*y[i-1] + delta and I integrate over the first variable; the range of integration is lower=-y[i]+rho1*y[i-1] upper=y[i]-rho1*y[i-1] The other two integrals are very similar. It would be of great help if you could point out how to integrate the expressions analytically using "pnorm". Thanks. Deepankar On Fri, 2007-05-25 at 04:22 +0100, Prof Brian Ripley wrote:> You are trying to use a derivative-based optimization method without > supplying derivatives. This will use numerical approoximations to the > derivatives, and your objective function will not be suitable as it is > internally using adaptive numerical quadrature and hence is probably not > close enough to a differentiable function (it may well have steps). > > I believe you can integrate analytically (the answer will involve pnorm), > and that you can also find analytical derivatives. > > Using (each of) numerical optimization and integration is a craft, and it > seems you need to know more about it. The references on ?optim are too > advanced I guess, so you could start with Chapter 16 of MASS and its > references. > > On Thu, 24 May 2007, Deepankar Basu wrote: > > > Hi R users, > > > > I have a couple of questions about some problems that I am facing with > > regard to numerical integration and optimization of likelihood > > functions. Let me provide a little background information: I am trying > > to do maximum likelihood estimation of an econometric model that I have > > developed recently. I estimate the parameters of the model using the > > monthly US unemployment rate series obtained from the Federal Reserve > > Bank of St. Louis. (The data is freely available from their web-based > > database called FRED-II). > > > > For my model, the likelihood function for each observation is the sum of > > three integrals. The integrand in each of these integrals is of the > > following form: > > > > A*exp(B+C*x-D*x^2) > > > > where A, B, C and D are constants, exp() is the exponential function and > > x is the variable of integration. The constants A and D are always > > positive; B is always negative, while there is no a priori knowledge > > about the sign of C. All the constants are finite. > > > > Of the three integrals, one has finite limits while the other two have > > the following limits: > > > > lower = -Inf > > upper = some finite number (details can be found in the code below) > > Try integrating that analytically by change of variable to a normal CDF. > > > > My problem is the following: when I try to maximize the log-likelihood > > function using "optim" with method "BFGS", I get the following error > > message (about the second integral): > > > >> out <- optim(alpha.start, LLK, gr=NULL, method="BFGS", y=urate$y) > > Error in integrate(f3, lower = -Inf, upper = upr2) : > > the integral is probably divergent > > > > Since I know that all the three integrals are convergent, I do not > > understand why I am getting this error message. My first question: can > > someone explain what mistake I am making? > > > > What is even more intriguing is that when I use the default method > > (Nelder-Mead) in "optim" instead of BFGS, I do not get any such error > > message. Since both methods (Nelder-Mead and BFGS) will need to evaluate > > the integrals, my second question is: why this difference? > > > > Below, I am providing the code that I use. Any help will be greatly > > appreciated. > > > > > > Deepankar > > > > > > ************ CODE START ******************* > > > > > > > > ############################# > > # COMPUTING THE LOGLIKELIHOOD > > # USING NUMERICAL INTEGRALS > > ############################# > > > > LLK <- function(alpha, y) { > > > > n <- length(y) > > lglik <- numeric(n) # TO BE SUMMED LATER TO GET THE LOGLIKELIHOOD > > > > lambda <- numeric(n-1) # GENERATING *lstar* > > for (i in 1:(n-1)) { # TO USE IN THE > > lambda[i] <- y[i+1]/y[i] # RE-PARAMETRIZATION BELOW > > } > > lstar <- (min(lambda)-0.01) > > > > > > # NOTE RE-PARAMETRIZATION > > # THESE RESTRICTIONS EMERGE FROM THE MODEL > > > > muep <- alpha[1] # NO RESTRICTION > > sigep <- 0.01 + exp(alpha[2]) # greater than > > 0.01 > > sigeta <- 0.01 + exp(alpha[3]) # greater than > > 0.01 > > rho2 <- 0.8*sin(alpha[4]) # between -0.8 > > and 0.8 > > rho1 <- lstar*abs(alpha[5])/(1+abs(alpha[5])) # between 0 and > > lstar > > delta <- 0.01 + exp(alpha[6]) # greater than > > 0.01 > > > > > > ########################################## > > # THE THREE FUNCTIONS TO INTEGRATE > > # FOR COMPUTING THE LOGLIKELIHOOD > > ########################################## > > > > denom <- 2*pi*sigep*sigeta*(sqrt(1-rho2^2)) # A CONSTANT TO BE USED > > # FOR DEFINING THE > > # THREE FUNCTIONS > > > > > > f1 <- function(z1) { # FIRST FUNCTION > > > > b11 <- ((z1-muep)^2)/((-2)*(1-rho2^2)*(sigep^2)) > > b12 <- > > (rho2*(z1-muep)*(y[i]-y[i-1]+delta))/((1-rho2^2)*sigep*sigeta) > > b13 <- ((y[i]-y[i-1]+delta)^2)/((-2)*(1-rho2^2)*(sigeta^2)) > > > > return((exp(b11+b12+b13))/denom) > > } > > > > f3 <- function(z3) { # SECOND FUNCTION > > > > b31 <- ((y[i]-rho1*y[i-1]-muep)^2)/((-2)*(1-rho2^2)*(sigep^2)) > > b32 <- > > (rho2*(y[i]-rho1*y[i-1]-muep)*z3)/((1-rho2^2)*sigep*sigeta) > > b33 <- ((z3)^2)/((-2)*(1-rho2^2)*(sigeta^2)) > > > > return((exp(b31+b32+b33))/denom) > > } > > > > f5 <- function(z5) { # THIRD FUNCTION > > > > b51 <- ((-y[i]+rho1*y[i-1]-muep)^2)/((-2)*(1-rho2^2)*sigep^2) > > b52 <- > > (rho2*(-y[i]+rho1*y[i-1]-muep)*(z5))/((1-rho2^2)*sigep*sigeta) > > b53 <- ((z5)^2)/((-2)*(1-rho2^2)*(sigeta^2)) > > > > return((exp(b51+b52+b53))/denom) > > } > > > > > > for (i in 2:n) { # START FOR LOOP > > > > upr1 <- (y[i]-rho1*y[i-1]) > > upr2 <- (y[i]-y[i-1]+delta) > > > > # INTEGRATING THE THREE FUNCTIONS > > out1 <- integrate(f1, lower = (-1)*upr1, upper = upr1) > > out3 <- integrate(f3, lower = -Inf, upper = upr2) > > out5 <- integrate(f5, lower= -Inf, upper = upr2) > > > > pdf <- (out1$val + out3$val + out5$val) > > > > lglik[i] <- log(pdf) # LOGLIKELIHOOD FOR OBSERVATION i > > > > } # END FOR LOOP > > > > return(-sum(lglik)) # RETURNING NEGATIVE OF THE LOGLIKELIHOOD > > # BECAUSE optim DOES MINIMIZATION BY DEFAULT > > } > > > > ***************** CODE ENDS ********************************* > > > > Then I use: > > > >> urate <- read.table("~/Desktop/UNRATE1.txt", header=TRUE) # DATA > >> alpha.start <- c(0.5, -1, -1, 0, 1, -1) # STARTING VALUES > >> out <- optim(alpha.start, LLK, gr=NULL, y=urate$y) # THIS GIVES NO > > ERROR > > > > or > > > >> out <- optim(alpha.start, LLK, gr=NULL, method="BFGS", y=urate$y) > > Error in integrate(f3, lower = -Inf, upper = upr2) : > > the integral is probably divergent > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >
Seemingly Similar Threads
- Newbie on functions
- Lattice graphics and strip function
- correlation significance testing with multiple factor levels
- how do I build panel data/longitudinal data models with AR terms using the plm package or any other package
- User specified correlation structure (e.g., 2-banded Toeplitz)