Adam Zeilinger
2012-Sep-27 23:48 UTC
[R] problems with mle2 convergence and with writing gradient function
Dear R help, I am trying solve an MLE convergence problem: I would like to estimate four parameters, p1, p2, mu1, mu2, which relate to the probabilities, P1, P2, P3, of a multinomial (trinomial) distribution. I am using the mle2() function and feeding it a time series dataset composed of four columns: time point, number of successes in category 1, number of successes in category 2, and number of success in category 3. The column headers are: t, n1, n2, and n3. The mle2() function converges occasionally, and I need to improve the rate of convergence when used in a stochastic simulation, with multiple stochastically generated datasets. When mle2() does not converge, it returns an error: "Error in optim(par = c(2, 2, 0.001, 0.001), fn = function (p) : L-BFGS-B needs finite values of 'fn'." I am using the L-BFGS-B optimization method with a lower box constraint of zero for all four parameters. Negative parameter values make no sense because they are rates. While I do not know any theoretical upper limit(s) to the parameter values, using empirical data, I have not seen any parameter estimates above 2. It seems that when I start with certain 'true' parameter values, the rate of convergence is very high, whereas other "true" parameter values are very difficult to estimate. For example, the true parameter values p1 = 2, p2 = 2, mu1 = 0.001, mu2 = 0.001 causes convergence problems, but when using the parameter values p1 = 0.3, p2 = 0.3, mu1 = 0.08, mu2 = 0.08, mle2() converges nearly 100% of the time. First question: Does anyone have suggestions on how to improve the rate of convergence and avoid the "finite values of 'fn'" error? Here is reproducible and relevant code from my stochastic simulation: library(bbmle) library(combinat) # define multinomial distribution dmnom2 <- function(x,prob,log=FALSE) { r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1)) if (log) r else exp(r) } # vector of time points tv <- 1:20 # Negative log likelihood function NLL.func <- function(p1, p2, mu1, mu2, y){ t <- y$tv n1 <- y$n1 n2 <- y$n2 n3 <- y$n3 P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) + mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))) P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 + mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))) P3 <- 1 - P1 - P2 p.all <- c(P1, P2, P3) -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE)) } ## Generate simulated data # Model probability equations as expressions, P1 <- expression((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) + mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))) P2 <- expression((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 + mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))) # True parameter values p1t = 2; p2t = 2; mu1t = 0.001; mu2t = 0.001 # Function to calculate model probabilities from true parameter values psim <- function(x){ params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = x) eval.P1 <- eval(P1, params) eval.P2 <- eval(P2, params) P3 <- 1 - eval.P1 - eval.P2 c(x, matrix(c(eval.P1, eval.P2, P3), ncol = 3)) } pdat <- sapply(tv, psim, simplify = TRUE) Pdat <- as.data.frame(t(pdat)) names(Pdat) <- c("time", "P1", "P2", "P3") # Generate simulated data set from model probabilities n = rep(20, length(tv)) p = as.matrix(Pdat[,2:4]) y <- as.data.frame(rmultinomial(n,p)) yt <- cbind(tv, y) names(yt) <- c("tv", "n1", "n2", "n3") # mle2 call mle.fit <- mle2(NLL.func, data = list(y = yt), start = list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t), control = list(maxit = 5000, lmm = 17, trace = 5), method = "L-BFGS-B", skip.hessian = TRUE, lower = list(p1 = 0, p2 = 0, mu1 = 0, mu2 = 0)) Given the type of error message, it seems that a gradient function might help. I attempted to write a gradient function but ran into trouble there as well. Second question: Do you think a gradient function would be the best way to improve my convergence problem? If so, do you have any suggestions on improving my gradient function, which follows? I derived the gradient function by taking the derivative of my NLL equation with respect to each parameter. My NLL equation is the probability mass function of the trinomial distribution. Thus the gradient equation for the parameter p1 would be: gr.p1 <- deriv(log(P1^n1), p1) + deriv(log(P2^n2), p1) + deriv(log(P3^n3), p1) This produces a very large equation, which I won't reproduce here. Let's say that the four gradient equations for the four parameters are defined as gr.p1, gr.p2, gr.mu1, gr.mu2. These gradient equations are functions of p1, p2, mu1, mu2, t, n1, n2, and n3. My current gradient function is: grr <- function(p1, p2, mu1, mu2, y){ t <- y[,1] n1 <- y[,2] n2 <- y[,3] n3 <- y[,4] c(gr.p1, gr.p2, gr.mu1, gr.mu2) } The problem is that I need to supply values for t, n1, n2, and n3 to the gradient function, which are from the dataset yt, above. This function produces a vector of length 4*nrow(yt) = 80, whereas mle2() requires a vector of length 4. How do I write my gradient function to work in mle2()? I hope this makes sense. Sincerely, Adam Zeilinger -- Adam Zeilinger Post Doctoral Scholar Department of Entomology University of California Riverside www.linkedin.com/in/adamzeilinger [[alternative HTML version deleted]]
Reasonably Related Threads
- problem with convergence in mle2/optim function
- model selection with spg and AIC (or, convert list to fitted model object)
- simultaneously maximizing two independent log likelihood functions using mle2
- help with combination problem
- How to avoid the NaN errors in dnbinom?