Dear All, I am running a simulation to obtain coverage probability of Wald type confidence intervals for my parameter d in a function of two parameters (mu,d). I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I want to invert the Hessian matrix to get Standard errors of the two parameter estimates. However, my Hessian matrix at times becomes non-invertible that is it is no more positive definite and I get the following error msg: "Error in solve.default(ac$hessian) : system is computationally singular: reciprocal condition number = 6.89585e-21" Thank you Following is the code I am running I would really appreciate your comments and suggestions: #Start Code #option to trace /recover error #options(error = recover) #Sample Size n<-30 mu<-5 size<- 2 #true values of parameter d d.true<-1+mu/size d.true #true value of zero inflation index phi= 1+log(d)/(1-d) z.true<-1+(log(d.true)/(1-d.true)) z.true # Allocating space for simulation vectors and setting counters for simulation counter<-0 iter<-10000 lower.d<-numeric(iter) upper.d<-numeric(iter) #set.seed(987654321) #begining of simulation loop######## for (i in 1:iter){ r.NB<-rnbinom(n, mu = mu, size = size) y<-sort(r.NB) iter.num<-i print(y) print(iter.num) #empirical estimates or sample moments xbar<-mean(y) variance<-(sum((y-xbar)^2))/length(y) dbar<-variance/xbar #sample estimate of proportion of zeros and zero inflation index pbar<-length(y[y==0])/length(y) ### Simplified function ############################################# NegBin<-function(th){ mu<-th[1] d<-th[2] n<-length(y) arg1<-n*mean(y)*ifelse(mu >= 0, log(mu),0) #arg1<-n*mean(y)*log(mu) #arg2<-n*log(d)*((mean(y))+mu/(d-1)) arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1), 0.0000001)) aa<-numeric(length(max(y))) a<-numeric(length(y)) for (i in 1:n) { for (j in 1:y[i]){ aa[j]<-ifelse(((j-1)*(d-1))/mu >0,log(1+((j-1)*(d-1))/mu),0) #aa[j]<-log(1+((j-1)*(d-1))/mu) #print(aa[j]) } a[i]<-sum(aa) #print(a[i]) } a arg3<-sum(a) llh<-arg1+arg2+arg3 if(! is.finite(llh)) llh<-1e+20 -llh } ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lowerc(0,1) ) ac print(ac$hessian) muhat<-ac$par[1] dhat<-ac$par[2] zhat<- 1+(log(dhat)/(1-dhat)) infor<-solve(ac$hessian) var.dhat<-infor[2,2] se.dhat<-sqrt(var.dhat) var.muhat<-infor[1,1] se.muhat<-sqrt(var.muhat) var.func<-dhat*muhat var.func d.prime<-cbind(dhat,muhat) se.var.func<-d.prime%*%infor%*%t(d.prime) se.var.func lower.d[i]<-dhat-1.96*se.dhat upper.d[i]<-dhat+1.96*se.dhat if(lower.d[i] <= d.true & d.true<= upper.d[i]) counter <-counter+1 } counter covg.prob<-counter/iter covg.prob
I have not really looked into the details of the lengthy and almost unreadable code below. In any case, there are good reasons why numerics software typically uses Fisher scoring / IWLS in order to fit GLMs..... And if your matrix is that "singular", even the common numerical tricks may not save the day anymore. 7e-21 is very close to exact singularity! Uwe Ligges On 02.09.2011 21:33, tzaihra at alcor.concordia.ca wrote:> Dear All, > > I am running a simulation to obtain coverage probability of Wald type > confidence intervals for my parameter d in a function of two parameters > (mu,d). > > I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I > want to invert the Hessian matrix to get Standard errors of the two > parameter estimates. However, my Hessian matrix at times becomes > non-invertible that is it is no more positive definite and I get the > following error msg: > > "Error in solve.default(ac$hessian) : system is computationally singular: > reciprocal condition number = 6.89585e-21" > Thank you > > Following is the code I am running I would really appreciate your comments > and suggestions: > > #Start Code > #option to trace /recover error > #options(error = recover) > > #Sample Size > n<-30 > mu<-5 > size<- 2 > > #true values of parameter d > d.true<-1+mu/size > d.true > > #true value of zero inflation index phi= 1+log(d)/(1-d) > z.true<-1+(log(d.true)/(1-d.true)) > z.true > > # Allocating space for simulation vectors and setting counters for simulation > counter<-0 > iter<-10000 > lower.d<-numeric(iter) > upper.d<-numeric(iter) > > #set.seed(987654321) > > #begining of simulation loop######## > > for (i in 1:iter){ > r.NB<-rnbinom(n, mu = mu, size = size) > y<-sort(r.NB) > iter.num<-i > print(y) > print(iter.num) > #empirical estimates or sample moments > xbar<-mean(y) > variance<-(sum((y-xbar)^2))/length(y) > dbar<-variance/xbar > #sample estimate of proportion of zeros and zero inflation index > pbar<-length(y[y==0])/length(y) > > ### Simplified function ############################################# > > NegBin<-function(th){ > mu<-th[1] > d<-th[2] > n<-length(y) > > arg1<-n*mean(y)*ifelse(mu>= 0, log(mu),0) > #arg1<-n*mean(y)*log(mu) > > #arg2<-n*log(d)*((mean(y))+mu/(d-1)) > arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1), > 0.0000001)) > > aa<-numeric(length(max(y))) > a<-numeric(length(y)) > for (i in 1:n) > { > for (j in 1:y[i]){ > aa[j]<-ifelse(((j-1)*(d-1))/mu>0,log(1+((j-1)*(d-1))/mu),0) > #aa[j]<-log(1+((j-1)*(d-1))/mu) > #print(aa[j]) > } > > a[i]<-sum(aa) > #print(a[i]) > } > a > arg3<-sum(a) > llh<-arg1+arg2+arg3 > if(! is.finite(llh)) > llh<-1e+20 > -llh > } > ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower> c(0,1) ) > ac > print(ac$hessian) > muhat<-ac$par[1] > dhat<-ac$par[2] > zhat<- 1+(log(dhat)/(1-dhat)) > infor<-solve(ac$hessian) > var.dhat<-infor[2,2] > se.dhat<-sqrt(var.dhat) > var.muhat<-infor[1,1] > se.muhat<-sqrt(var.muhat) > var.func<-dhat*muhat > var.func > d.prime<-cbind(dhat,muhat) > > se.var.func<-d.prime%*%infor%*%t(d.prime) > se.var.func > lower.d[i]<-dhat-1.96*se.dhat > upper.d[i]<-dhat+1.96*se.dhat > > if(lower.d[i]<= d.true& d.true<= upper.d[i]) > counter<-counter+1 > } > counter > covg.prob<-counter/iter > covg.prob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Unless you are supplying analytic hessian code, you are almost certainly getting an approximation. Worse, if you do not provide gradients, these are the result of two levels of differencing, so you should expect some loss of precision in the approximate Hessian. Moreover, if your estimate of the optimum is a little bit off, or the optimizer has terminated (algorithms converge, programs terminate) to a point that is not an optimum, there is no reason the Hessian should be positive definite. Package optimx() uses the Jacobian of the gradient if the analytic gradient is available. This drops the differencing to 1 level. Even better is to code the Hessian, but that is messy and tedious in most cases. Best, JN On 09/03/2011 06:00 AM, r-help-request at r-project.org wrote:> Message: 59 > Date: Fri, 2 Sep 2011 15:33:13 -0400 > From: tzaihra at alcor.concordia.ca > To: r-help at r-project.org > Subject: [R] Hessian Matrix Issue > Message-ID: > <e6dc43b4487eb4a4055e1ab485f015f0.squirrel at webmail.concordia.ca> > Content-Type: text/plain;charset=iso-8859-1 > > Dear All, > > I am running a simulation to obtain coverage probability of Wald type > confidence intervals for my parameter d in a function of two parameters > (mu,d). > > I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I > want to invert the Hessian matrix to get Standard errors of the two > parameter estimates. However, my Hessian matrix at times becomes > non-invertible that is it is no more positive definite and I get the > following error msg: > > "Error in solve.default(ac$hessian) : system is computationally singular: > reciprocal condition number = 6.89585e-21" > Thank you > > Following is the code I am running I would really appreciate your comments > and suggestions: > > #Start Code > #option to trace /recover error > #options(error = recover) > > #Sample Size > n<-30 > mu<-5 > size<- 2 > > #true values of parameter d > d.true<-1+mu/size > d.true > > #true value of zero inflation index phi= 1+log(d)/(1-d) > z.true<-1+(log(d.true)/(1-d.true)) > z.true > > # Allocating space for simulation vectors and setting counters for simulation > counter<-0 > iter<-10000 > lower.d<-numeric(iter) > upper.d<-numeric(iter) > > #set.seed(987654321) > > #begining of simulation loop######## > > for (i in 1:iter){ > r.NB<-rnbinom(n, mu = mu, size = size) > y<-sort(r.NB) > iter.num<-i > print(y) > print(iter.num) > #empirical estimates or sample moments > xbar<-mean(y) > variance<-(sum((y-xbar)2))/length(y) > dbar<-variance/xbar > #sample estimate of proportion of zeros and zero inflation index > pbar<-length(y[y==0])/length(y) > > ### Simplified function ############################################# > > NegBin<-function(th){ > mu<-th[1] > d<-th[2] > n<-length(y) > > arg1<-n*mean(y)*ifelse(mu >= 0, log(mu),0) > #arg1<-n*mean(y)*log(mu) > > #arg2<-n*log(d)*((mean(y))+mu/(d-1)) > arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1), > 0.0000001)) > > aa<-numeric(length(max(y))) > a<-numeric(length(y)) > for (i in 1:n) > { > for (j in 1:y[i]){ > aa[j]<-ifelse(((j-1)*(d-1))/mu >0,log(1+((j-1)*(d-1))/mu),0) > #aa[j]<-log(1+((j-1)*(d-1))/mu) > #print(aa[j]) > } > > a[i]<-sum(aa) > #print(a[i]) > } > a > arg3<-sum(a) > llh<-arg1+arg2+arg3 > if(! is.finite(llh)) > llh<-1e+20 > -llh > } > ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower> c(0,1) ) > ac > print(ac$hessian) > muhat<-ac$par[1] > dhat<-ac$par[2] > zhat<- 1+(log(dhat)/(1-dhat)) > infor<-solve(ac$hessian) > var.dhat<-infor[2,2] > se.dhat<-sqrt(var.dhat) > var.muhat<-infor[1,1] > se.muhat<-sqrt(var.muhat) > var.func<-dhat*muhat > var.func > d.prime<-cbind(dhat,muhat) > > se.var.func<-d.prime%*%infor%*%t(d.prime) > se.var.func > lower.d[i]<-dhat-1.96*se.dhat > upper.d[i]<-dhat+1.96*se.dhat > > if(lower.d[i] <= d.true & d.true<= upper.d[i]) > counter <-counter+1 > } > counter > covg.prob<-counter/iter > covg.prob > > >
I wonder if your code is correct? I ran your script until an error was reported. the data set of 30 obs was [1] 0 0 1 3 3 3 4 4 4 4 5 5 5 5 5 7 7 7 7 7 7 8 9 10 11 [26] 12 12 12 15 16 I created a tiny AD Model Builder program to do MLE on it. DATA_SECTION init_int nobs init_vector y(1,nobs) PARAMETER_SECTION init_number log_mu init_number log_alpha sdreport_number mu sdreport_number tau objective_function_value f PROCEDURE_SECTION mu=exp(log_mu); tau=1.0+exp(log_alpha); for (int i=1;i<=nobs;i++) { f-=log_negbinomial_density(y(i),mu,tau); } It converged quickly and The eigenvalues of the Hessian were 4.711089774 78.27632341 and the estimates and std devs of the parameters mu and tau were index name value std dev 3 mu 6.6000e+00 7.7318e-01 4 tau 2.7173e+00 7.8944e-01 where tau is the variance divided by the mean. This was all so simple that I suspect your (rather difficult to read) R code is wrong, otherwise R must really suck at this kind of problem. Dave
About the numerical calculation of the Hessian matrix, I have found numDeriv:::hessian to be often more accurate than the Hessian returned by optim. Best, Giovanni Petris On Sat, 2011-09-03 at 13:39 -0400, John C Nash wrote:> Unless you are supplying analytic hessian code, you are almost certainly getting an > approximation. Worse, if you do not provide gradients, these are the result of two levels > of differencing, so you should expect some loss of precision in the approximate Hessian. > > Moreover, if your estimate of the optimum is a little bit off, or the optimizer has > terminated (algorithms converge, programs terminate) to a point that is not an optimum, > there is no reason the Hessian should be positive definite. > > Package optimx() uses the Jacobian of the gradient if the analytic gradient is available. > This drops the differencing to 1 level. Even better is to code the Hessian, but that is > messy and tedious in most cases. > > Best, JN > > > On 09/03/2011 06:00 AM, r-help-request at r-project.org wrote: > > Message: 59 > > Date: Fri, 2 Sep 2011 15:33:13 -0400 > > From: tzaihra at alcor.concordia.ca > > To: r-help at r-project.org > > Subject: [R] Hessian Matrix Issue > > Message-ID: > > <e6dc43b4487eb4a4055e1ab485f015f0.squirrel at webmail.concordia.ca> > > Content-Type: text/plain;charset=iso-8859-1 > > > > Dear All, > > > > I am running a simulation to obtain coverage probability of Wald type > > confidence intervals for my parameter d in a function of two parameters > > (mu,d). > > > > I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I > > want to invert the Hessian matrix to get Standard errors of the two > > parameter estimates. However, my Hessian matrix at times becomes > > non-invertible that is it is no more positive definite and I get the > > following error msg: > > > > "Error in solve.default(ac$hessian) : system is computationally singular: > > reciprocal condition number = 6.89585e-21" > > Thank you > > > > Following is the code I am running I would really appreciate your comments > > and suggestions: > > > > #Start Code > > #option to trace /recover error > > #options(error = recover) > > > > #Sample Size > > n<-30 > > mu<-5 > > size<- 2 > > > > #true values of parameter d > > d.true<-1+mu/size > > d.true > > > > #true value of zero inflation index phi= 1+log(d)/(1-d) > > z.true<-1+(log(d.true)/(1-d.true)) > > z.true > > > > # Allocating space for simulation vectors and setting counters for simulation > > counter<-0 > > iter<-10000 > > lower.d<-numeric(iter) > > upper.d<-numeric(iter) > > > > #set.seed(987654321) > > > > #begining of simulation loop######## > > > > for (i in 1:iter){ > > r.NB<-rnbinom(n, mu = mu, size = size) > > y<-sort(r.NB) > > iter.num<-i > > print(y) > > print(iter.num) > > #empirical estimates or sample moments > > xbar<-mean(y) > > variance<-(sum((y-xbar)2))/length(y) > > dbar<-variance/xbar > > #sample estimate of proportion of zeros and zero inflation index > > pbar<-length(y[y==0])/length(y) > > > > ### Simplified function ############################################# > > > > NegBin<-function(th){ > > mu<-th[1] > > d<-th[2] > > n<-length(y) > > > > arg1<-n*mean(y)*ifelse(mu >= 0, log(mu),0) > > #arg1<-n*mean(y)*log(mu) > > > > #arg2<-n*log(d)*((mean(y))+mu/(d-1)) > > arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1), > > 0.0000001)) > > > > aa<-numeric(length(max(y))) > > a<-numeric(length(y)) > > for (i in 1:n) > > { > > for (j in 1:y[i]){ > > aa[j]<-ifelse(((j-1)*(d-1))/mu >0,log(1+((j-1)*(d-1))/mu),0) > > #aa[j]<-log(1+((j-1)*(d-1))/mu) > > #print(aa[j]) > > } > > > > a[i]<-sum(aa) > > #print(a[i]) > > } > > a > > arg3<-sum(a) > > llh<-arg1+arg2+arg3 > > if(! is.finite(llh)) > > llh<-1e+20 > > -llh > > } > > ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower> > c(0,1) ) > > ac > > print(ac$hessian) > > muhat<-ac$par[1] > > dhat<-ac$par[2] > > zhat<- 1+(log(dhat)/(1-dhat)) > > infor<-solve(ac$hessian) > > var.dhat<-infor[2,2] > > se.dhat<-sqrt(var.dhat) > > var.muhat<-infor[1,1] > > se.muhat<-sqrt(var.muhat) > > var.func<-dhat*muhat > > var.func > > d.prime<-cbind(dhat,muhat) > > > > se.var.func<-d.prime%*%infor%*%t(d.prime) > > se.var.func > > lower.d[i]<-dhat-1.96*se.dhat > > upper.d[i]<-dhat+1.96*se.dhat > > > > if(lower.d[i] <= d.true & d.true<= upper.d[i]) > > counter <-counter+1 > > } > > counter > > covg.prob<-counter/iter > > covg.prob > > > > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Yes, numDeriv::hessian is very accurate. So, I normally take the output from the optimizer, *if it is a local optimum*, and then apply numDeriv::hessian to it. I, then, compute the standard errors from it. However, it is important to know that you have obtained a local optimum. In fact, you need the gradient and Hessian to actually verify this - the first and second order KKT conditions. However, this is tricky in *constrained* optimization problems. If you have constraints, and at least one of the constraints is active at the best parameter estimate, then the gradient of the original objective function need not be close to zero and the hessian is also incorrect. If you employ an augmented Lagrangian approach (see alabama::auglag), then you can obtain the correct gradient and Hessian at best parameter estimate. These gradients and Hessian correspond to the modified objective function that includes a Lagrangian term augmented by a quadratic penalty term. They can be used to check the KKT conditions. Here is a simple example: require(alabama) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } p0 <- c(0, 0) ans1 <- optim(par=p0, fn=fr, gr=grr, upper=c(0.9, Inf), method="L-BFGS-B", hessian=TRUE) grr(ans1$par) # this will not be close to zero ans1$hessian # Using an augmented Lagrangian optimizer hcon <- function(x) 0.9 - x[1] hcon.jac <- function(x) matrix(c(-1, 0) , 1, 2) ans2 <- auglag(par=p0, fn=fr, gr=grr, hin=hcon, hin.jac=hcon.jac) ans2$gradient # this will be close to zero ans2$hessian However, it is not known whether the standard errors obtained from this Hessian are asymptotically valid. Best, Ravi. ------------------------------------------------------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvaradhan@jhmi.edu<mailto:rvaradhan@jhmi.edu> [[alternative HTML version deleted]]