Adam Zeilinger
2012-Sep-27 23:48 UTC
[R] problems with mle2 convergence and with writing gradient function
Dear R help,
I am trying solve an MLE convergence problem: I would like to estimate
four parameters, p1, p2, mu1, mu2, which relate to the probabilities,
P1, P2, P3, of a multinomial (trinomial) distribution. I am using the
mle2() function and feeding it a time series dataset composed of four
columns: time point, number of successes in category 1, number of
successes in category 2, and number of success in category 3. The
column headers are: t, n1, n2, and n3.
The mle2() function converges occasionally, and I need to improve the
rate of convergence when used in a stochastic simulation, with multiple
stochastically generated datasets. When mle2() does not converge, it
returns an error: "Error in optim(par = c(2, 2, 0.001, 0.001), fn =
function (p) : L-BFGS-B needs finite values of 'fn'." I am using
the
L-BFGS-B optimization method with a lower box constraint of zero for all
four parameters. Negative parameter values make no sense because they
are rates. While I do not know any theoretical upper limit(s) to the
parameter values, using empirical data, I have not seen any parameter
estimates above 2. It seems that when I start with certain 'true'
parameter values, the rate of convergence is very high, whereas other
"true" parameter values are very difficult to estimate. For example,
the true parameter values p1 = 2, p2 = 2, mu1 = 0.001, mu2 = 0.001
causes convergence problems, but when using the parameter values p1 =
0.3, p2 = 0.3, mu1 = 0.08, mu2 = 0.08, mle2() converges nearly 100% of
the time.
First question: Does anyone have suggestions on how to improve the rate
of convergence and avoid the "finite values of 'fn'" error?
Here is reproducible and relevant code from my stochastic simulation:
library(bbmle)
library(combinat)
# define multinomial distribution
dmnom2 <- function(x,prob,log=FALSE) {
r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1))
if (log) r else exp(r)
}
# vector of time points
tv <- 1:20
# Negative log likelihood function
NLL.func <- function(p1, p2, mu1, mu2, y){
t <- y$tv
n1 <- y$n1
n2 <- y$n2
n3 <- y$n3
P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))) -
exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))) -
exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
P3 <- 1 - P1 - P2
p.all <- c(P1, P2, P3)
-sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE))
}
## Generate simulated data
# Model probability equations as expressions,
P1 <- expression((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))) -
exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
P2 <- expression((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))) -
exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
# True parameter values
p1t = 2; p2t = 2; mu1t = 0.001; mu2t = 0.001
# Function to calculate model probabilities from true parameter values
psim <- function(x){
params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = x)
eval.P1 <- eval(P1, params)
eval.P2 <- eval(P2, params)
P3 <- 1 - eval.P1 - eval.P2
c(x, matrix(c(eval.P1, eval.P2, P3), ncol = 3))
}
pdat <- sapply(tv, psim, simplify = TRUE)
Pdat <- as.data.frame(t(pdat))
names(Pdat) <- c("time", "P1", "P2",
"P3")
# Generate simulated data set from model probabilities
n = rep(20, length(tv))
p = as.matrix(Pdat[,2:4])
y <- as.data.frame(rmultinomial(n,p))
yt <- cbind(tv, y)
names(yt) <- c("tv", "n1", "n2",
"n3")
# mle2 call
mle.fit <- mle2(NLL.func, data = list(y = yt),
start = list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t),
control = list(maxit = 5000, lmm = 17, trace = 5),
method = "L-BFGS-B", skip.hessian = TRUE,
lower = list(p1 = 0, p2 = 0, mu1 = 0, mu2 = 0))
Given the type of error message, it seems that a gradient function might
help. I attempted to write a gradient function but ran into trouble
there as well.
Second question: Do you think a gradient function would be the best way
to improve my convergence problem? If so, do you have any suggestions
on improving my gradient function, which follows?
I derived the gradient function by taking the derivative of my NLL
equation with respect to each parameter. My NLL equation is the
probability mass function of the trinomial distribution. Thus the
gradient equation for the parameter p1 would be:
gr.p1 <- deriv(log(P1^n1), p1) + deriv(log(P2^n2), p1) +
deriv(log(P3^n3), p1)
This produces a very large equation, which I won't reproduce here. Let's
say that the four gradient equations for the four parameters are defined
as gr.p1, gr.p2, gr.mu1, gr.mu2. These gradient equations are functions
of p1, p2, mu1, mu2, t, n1, n2, and n3. My current gradient function is:
grr <- function(p1, p2, mu1, mu2, y){
t <- y[,1]
n1 <- y[,2]
n2 <- y[,3]
n3 <- y[,4]
c(gr.p1, gr.p2, gr.mu1, gr.mu2)
}
The problem is that I need to supply values for t, n1, n2, and n3 to the
gradient function, which are from the dataset yt, above. This function
produces a vector of length 4*nrow(yt) = 80, whereas mle2() requires a
vector of length 4. How do I write my gradient function to work in mle2()?
I hope this makes sense.
Sincerely,
Adam Zeilinger
--
Adam Zeilinger
Post Doctoral Scholar
Department of Entomology
University of California Riverside
www.linkedin.com/in/adamzeilinger
[[alternative HTML version deleted]]
Apparently Analagous Threads
- problem with convergence in mle2/optim function
- model selection with spg and AIC (or, convert list to fitted model object)
- simultaneously maximizing two independent log likelihood functions using mle2
- help with combination problem
- How to avoid the NaN errors in dnbinom?
