Adam Zeilinger
2012-Sep-27  23:48 UTC
[R] problems with mle2 convergence and with writing gradient function
Dear R help,
I am trying solve an MLE convergence problem: I would like to estimate 
four parameters, p1, p2, mu1, mu2, which relate to the probabilities, 
P1, P2, P3, of a multinomial (trinomial) distribution.  I am using the 
mle2() function and feeding it a time series dataset composed of four 
columns: time point, number of successes in category 1, number of 
successes in category 2, and number of success in category 3.  The 
column headers are: t, n1, n2, and n3.
The mle2() function converges occasionally, and I need to improve the 
rate of convergence when used in a stochastic simulation, with multiple 
stochastically generated datasets.  When mle2() does not converge, it 
returns an error: "Error in optim(par = c(2, 2, 0.001, 0.001), fn = 
function (p) : L-BFGS-B needs finite values of 'fn'."  I am using
the
L-BFGS-B optimization method with a lower box constraint of zero for all 
four parameters.  Negative parameter values make no sense because they 
are rates.  While I do not know any theoretical upper limit(s) to the 
parameter values, using empirical data, I have not seen any parameter 
estimates above 2.  It seems that when I start with certain 'true' 
parameter values, the rate of convergence is very high, whereas other 
"true" parameter values are very difficult to estimate.  For example, 
the true parameter values p1 = 2, p2 = 2, mu1 = 0.001, mu2 = 0.001 
causes convergence problems, but when using the parameter values p1 = 
0.3, p2 = 0.3, mu1 = 0.08, mu2 = 0.08, mle2() converges nearly 100% of 
the time.
First question: Does anyone have suggestions on how to improve the rate 
of convergence and avoid the "finite values of 'fn'" error?
Here is reproducible and relevant code from my stochastic simulation:
library(bbmle)
library(combinat)
# define multinomial distribution
dmnom2 <- function(x,prob,log=FALSE) {
   r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1))
   if (log) r else exp(r)
}
# vector of time points
tv <- 1:20
# Negative log likelihood function
NLL.func <- function(p1, p2, mu1, mu2, y){
   t <- y$tv
   n1 <- y$n1
   n2 <- y$n2
   n3 <- y$n3
   P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
     mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))) -
     exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
     mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
     2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
     exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
   P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
     mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))) -
     exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
     mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
     2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
     exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
   P3 <- 1 - P1 - P2
   p.all <- c(P1, P2, P3)
   -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE))
}
## Generate simulated data
# Model probability equations as expressions,
P1 <- expression((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
   mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))) -
   exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
   mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
   2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
   exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
P2 <- expression((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
   mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))) -
   exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
   mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
   2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
   exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
# True parameter values
p1t = 2; p2t = 2; mu1t = 0.001; mu2t = 0.001
# Function to calculate model probabilities from true parameter values
psim <- function(x){
   params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = x)
   eval.P1 <- eval(P1, params)
   eval.P2 <- eval(P2, params)
   P3 <- 1 - eval.P1 - eval.P2
   c(x, matrix(c(eval.P1, eval.P2, P3), ncol = 3))
}
pdat <- sapply(tv, psim, simplify = TRUE)
Pdat <- as.data.frame(t(pdat))
names(Pdat) <- c("time", "P1", "P2",
"P3")
# Generate simulated data set from model probabilities
n = rep(20, length(tv))
p = as.matrix(Pdat[,2:4])
y <- as.data.frame(rmultinomial(n,p))
yt <- cbind(tv, y)
names(yt) <- c("tv", "n1", "n2",
"n3")
# mle2 call
mle.fit <- mle2(NLL.func, data = list(y = yt),
                 start = list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t),
                 control = list(maxit = 5000, lmm = 17, trace = 5),
                 method = "L-BFGS-B", skip.hessian = TRUE,
                 lower = list(p1 = 0, p2 = 0, mu1 = 0, mu2 = 0))
Given the type of error message, it seems that a gradient function might 
help.  I attempted to write a gradient function but ran into trouble 
there as well.
Second question: Do you think a gradient function would be the best way 
to improve my convergence problem?  If so, do you have any suggestions 
on improving my gradient function, which follows?
I derived the gradient function by taking the derivative of my NLL 
equation with respect to each parameter.  My NLL equation is the 
probability mass function of the trinomial distribution.  Thus the 
gradient equation for the parameter p1 would be:
gr.p1 <- deriv(log(P1^n1), p1) + deriv(log(P2^n2), p1) + 
deriv(log(P3^n3), p1)
This produces a very large equation, which I won't reproduce here. Let's
say that the four gradient equations for the four parameters are defined 
as gr.p1, gr.p2, gr.mu1, gr.mu2.  These gradient equations are functions 
of p1, p2, mu1, mu2, t, n1, n2, and n3.  My current gradient function is:
grr <- function(p1, p2, mu1, mu2, y){
   t <- y[,1]
   n1 <- y[,2]
   n2 <- y[,3]
   n3 <- y[,4]
   c(gr.p1, gr.p2, gr.mu1, gr.mu2)
}
The problem is that I need to supply values for t, n1, n2, and n3 to the 
gradient function, which are from the dataset yt, above.  This function 
produces a vector of length 4*nrow(yt) = 80, whereas mle2() requires a 
vector of length 4.  How do I write my gradient function to work in mle2()?
I hope this makes sense.
Sincerely,
Adam Zeilinger
-- 
Adam Zeilinger
Post Doctoral Scholar
Department of Entomology
University of California Riverside
www.linkedin.com/in/adamzeilinger
	[[alternative HTML version deleted]]
Apparently Analagous Threads
- problem with convergence in mle2/optim function
- model selection with spg and AIC (or, convert list to fitted model object)
- simultaneously maximizing two independent log likelihood functions using mle2
- help with combination problem
- How to avoid the NaN errors in dnbinom?
