Dear All,
I am running a simulation to obtain coverage probability of Wald type
confidence intervals for my parameter d in a function of two parameters
(mu,d).
I am optimizing it using "optim" method "L-BFGS-B" to obtain
MLE. As, I
want to invert the Hessian matrix to get Standard errors of the two
parameter estimates. However, my Hessian matrix at times becomes
non-invertible that is it is no more positive definite and I get the
following error msg:
"Error in solve.default(ac$hessian) : system is computationally singular:
reciprocal condition number = 6.89585e-21"
Thank you
Following is the code I am running I would really appreciate your comments
and suggestions:
#Start Code
#option to trace /recover error
#options(error = recover)
#Sample Size
n<-30
mu<-5
size<- 2
#true values of parameter d
d.true<-1+mu/size
d.true
#true value of zero inflation index phi= 1+log(d)/(1-d)
z.true<-1+(log(d.true)/(1-d.true))
z.true
# Allocating space for simulation vectors and setting counters for simulation
counter<-0
iter<-10000
lower.d<-numeric(iter)
upper.d<-numeric(iter)
#set.seed(987654321)
#begining of simulation loop########
for (i in 1:iter){
r.NB<-rnbinom(n, mu = mu, size = size)
y<-sort(r.NB)
iter.num<-i
print(y)
print(iter.num)
#empirical estimates or sample moments
xbar<-mean(y)
variance<-(sum((y-xbar)^2))/length(y)
dbar<-variance/xbar
#sample estimate of proportion of zeros and zero inflation index
pbar<-length(y[y==0])/length(y)
### Simplified function #############################################
NegBin<-function(th){
mu<-th[1]
d<-th[2]
n<-length(y)
arg1<-n*mean(y)*ifelse(mu >= 0, log(mu),0)
#arg1<-n*mean(y)*log(mu)
#arg2<-n*log(d)*((mean(y))+mu/(d-1))
arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1),
0.0000001))
aa<-numeric(length(max(y)))
a<-numeric(length(y))
for (i in 1:n)
{
for (j in 1:y[i]){
aa[j]<-ifelse(((j-1)*(d-1))/mu >0,log(1+((j-1)*(d-1))/mu),0)
#aa[j]<-log(1+((j-1)*(d-1))/mu)
#print(aa[j])
}
a[i]<-sum(aa)
#print(a[i])
}
a
arg3<-sum(a)
llh<-arg1+arg2+arg3
if(! is.finite(llh))
llh<-1e+20
-llh
}
ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lowerc(0,1)
)
ac
print(ac$hessian)
muhat<-ac$par[1]
dhat<-ac$par[2]
zhat<- 1+(log(dhat)/(1-dhat))
infor<-solve(ac$hessian)
var.dhat<-infor[2,2]
se.dhat<-sqrt(var.dhat)
var.muhat<-infor[1,1]
se.muhat<-sqrt(var.muhat)
var.func<-dhat*muhat
var.func
d.prime<-cbind(dhat,muhat)
se.var.func<-d.prime%*%infor%*%t(d.prime)
se.var.func
lower.d[i]<-dhat-1.96*se.dhat
upper.d[i]<-dhat+1.96*se.dhat
if(lower.d[i] <= d.true & d.true<= upper.d[i])
counter <-counter+1
}
counter
covg.prob<-counter/iter
covg.prob
I have not really looked into the details of the lengthy and almost unreadable code below. In any case, there are good reasons why numerics software typically uses Fisher scoring / IWLS in order to fit GLMs..... And if your matrix is that "singular", even the common numerical tricks may not save the day anymore. 7e-21 is very close to exact singularity! Uwe Ligges On 02.09.2011 21:33, tzaihra at alcor.concordia.ca wrote:> Dear All, > > I am running a simulation to obtain coverage probability of Wald type > confidence intervals for my parameter d in a function of two parameters > (mu,d). > > I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I > want to invert the Hessian matrix to get Standard errors of the two > parameter estimates. However, my Hessian matrix at times becomes > non-invertible that is it is no more positive definite and I get the > following error msg: > > "Error in solve.default(ac$hessian) : system is computationally singular: > reciprocal condition number = 6.89585e-21" > Thank you > > Following is the code I am running I would really appreciate your comments > and suggestions: > > #Start Code > #option to trace /recover error > #options(error = recover) > > #Sample Size > n<-30 > mu<-5 > size<- 2 > > #true values of parameter d > d.true<-1+mu/size > d.true > > #true value of zero inflation index phi= 1+log(d)/(1-d) > z.true<-1+(log(d.true)/(1-d.true)) > z.true > > # Allocating space for simulation vectors and setting counters for simulation > counter<-0 > iter<-10000 > lower.d<-numeric(iter) > upper.d<-numeric(iter) > > #set.seed(987654321) > > #begining of simulation loop######## > > for (i in 1:iter){ > r.NB<-rnbinom(n, mu = mu, size = size) > y<-sort(r.NB) > iter.num<-i > print(y) > print(iter.num) > #empirical estimates or sample moments > xbar<-mean(y) > variance<-(sum((y-xbar)^2))/length(y) > dbar<-variance/xbar > #sample estimate of proportion of zeros and zero inflation index > pbar<-length(y[y==0])/length(y) > > ### Simplified function ############################################# > > NegBin<-function(th){ > mu<-th[1] > d<-th[2] > n<-length(y) > > arg1<-n*mean(y)*ifelse(mu>= 0, log(mu),0) > #arg1<-n*mean(y)*log(mu) > > #arg2<-n*log(d)*((mean(y))+mu/(d-1)) > arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1), > 0.0000001)) > > aa<-numeric(length(max(y))) > a<-numeric(length(y)) > for (i in 1:n) > { > for (j in 1:y[i]){ > aa[j]<-ifelse(((j-1)*(d-1))/mu>0,log(1+((j-1)*(d-1))/mu),0) > #aa[j]<-log(1+((j-1)*(d-1))/mu) > #print(aa[j]) > } > > a[i]<-sum(aa) > #print(a[i]) > } > a > arg3<-sum(a) > llh<-arg1+arg2+arg3 > if(! is.finite(llh)) > llh<-1e+20 > -llh > } > ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower> c(0,1) ) > ac > print(ac$hessian) > muhat<-ac$par[1] > dhat<-ac$par[2] > zhat<- 1+(log(dhat)/(1-dhat)) > infor<-solve(ac$hessian) > var.dhat<-infor[2,2] > se.dhat<-sqrt(var.dhat) > var.muhat<-infor[1,1] > se.muhat<-sqrt(var.muhat) > var.func<-dhat*muhat > var.func > d.prime<-cbind(dhat,muhat) > > se.var.func<-d.prime%*%infor%*%t(d.prime) > se.var.func > lower.d[i]<-dhat-1.96*se.dhat > upper.d[i]<-dhat+1.96*se.dhat > > if(lower.d[i]<= d.true& d.true<= upper.d[i]) > counter<-counter+1 > } > counter > covg.prob<-counter/iter > covg.prob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Unless you are supplying analytic hessian code, you are almost certainly getting an approximation. Worse, if you do not provide gradients, these are the result of two levels of differencing, so you should expect some loss of precision in the approximate Hessian. Moreover, if your estimate of the optimum is a little bit off, or the optimizer has terminated (algorithms converge, programs terminate) to a point that is not an optimum, there is no reason the Hessian should be positive definite. Package optimx() uses the Jacobian of the gradient if the analytic gradient is available. This drops the differencing to 1 level. Even better is to code the Hessian, but that is messy and tedious in most cases. Best, JN On 09/03/2011 06:00 AM, r-help-request at r-project.org wrote:> Message: 59 > Date: Fri, 2 Sep 2011 15:33:13 -0400 > From: tzaihra at alcor.concordia.ca > To: r-help at r-project.org > Subject: [R] Hessian Matrix Issue > Message-ID: > <e6dc43b4487eb4a4055e1ab485f015f0.squirrel at webmail.concordia.ca> > Content-Type: text/plain;charset=iso-8859-1 > > Dear All, > > I am running a simulation to obtain coverage probability of Wald type > confidence intervals for my parameter d in a function of two parameters > (mu,d). > > I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I > want to invert the Hessian matrix to get Standard errors of the two > parameter estimates. However, my Hessian matrix at times becomes > non-invertible that is it is no more positive definite and I get the > following error msg: > > "Error in solve.default(ac$hessian) : system is computationally singular: > reciprocal condition number = 6.89585e-21" > Thank you > > Following is the code I am running I would really appreciate your comments > and suggestions: > > #Start Code > #option to trace /recover error > #options(error = recover) > > #Sample Size > n<-30 > mu<-5 > size<- 2 > > #true values of parameter d > d.true<-1+mu/size > d.true > > #true value of zero inflation index phi= 1+log(d)/(1-d) > z.true<-1+(log(d.true)/(1-d.true)) > z.true > > # Allocating space for simulation vectors and setting counters for simulation > counter<-0 > iter<-10000 > lower.d<-numeric(iter) > upper.d<-numeric(iter) > > #set.seed(987654321) > > #begining of simulation loop######## > > for (i in 1:iter){ > r.NB<-rnbinom(n, mu = mu, size = size) > y<-sort(r.NB) > iter.num<-i > print(y) > print(iter.num) > #empirical estimates or sample moments > xbar<-mean(y) > variance<-(sum((y-xbar)2))/length(y) > dbar<-variance/xbar > #sample estimate of proportion of zeros and zero inflation index > pbar<-length(y[y==0])/length(y) > > ### Simplified function ############################################# > > NegBin<-function(th){ > mu<-th[1] > d<-th[2] > n<-length(y) > > arg1<-n*mean(y)*ifelse(mu >= 0, log(mu),0) > #arg1<-n*mean(y)*log(mu) > > #arg2<-n*log(d)*((mean(y))+mu/(d-1)) > arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1), > 0.0000001)) > > aa<-numeric(length(max(y))) > a<-numeric(length(y)) > for (i in 1:n) > { > for (j in 1:y[i]){ > aa[j]<-ifelse(((j-1)*(d-1))/mu >0,log(1+((j-1)*(d-1))/mu),0) > #aa[j]<-log(1+((j-1)*(d-1))/mu) > #print(aa[j]) > } > > a[i]<-sum(aa) > #print(a[i]) > } > a > arg3<-sum(a) > llh<-arg1+arg2+arg3 > if(! is.finite(llh)) > llh<-1e+20 > -llh > } > ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower> c(0,1) ) > ac > print(ac$hessian) > muhat<-ac$par[1] > dhat<-ac$par[2] > zhat<- 1+(log(dhat)/(1-dhat)) > infor<-solve(ac$hessian) > var.dhat<-infor[2,2] > se.dhat<-sqrt(var.dhat) > var.muhat<-infor[1,1] > se.muhat<-sqrt(var.muhat) > var.func<-dhat*muhat > var.func > d.prime<-cbind(dhat,muhat) > > se.var.func<-d.prime%*%infor%*%t(d.prime) > se.var.func > lower.d[i]<-dhat-1.96*se.dhat > upper.d[i]<-dhat+1.96*se.dhat > > if(lower.d[i] <= d.true & d.true<= upper.d[i]) > counter <-counter+1 > } > counter > covg.prob<-counter/iter > covg.prob > > >
I wonder if your code is correct?
I ran your script until an error was reported. the data set
of 30 obs was
[1] 0 0 1 3 3 3 4 4 4 4 5 5 5 5 5 7 7 7 7 7 7 8 9
10 11
[26] 12 12 12 15 16
I created a tiny AD Model Builder program to do MLE on it.
DATA_SECTION
init_int nobs
init_vector y(1,nobs)
PARAMETER_SECTION
init_number log_mu
init_number log_alpha
sdreport_number mu
sdreport_number tau
objective_function_value f
PROCEDURE_SECTION
mu=exp(log_mu);
tau=1.0+exp(log_alpha);
for (int i=1;i<=nobs;i++)
{
f-=log_negbinomial_density(y(i),mu,tau);
}
It converged quickly and
The eigenvalues of the Hessian were
4.711089774 78.27632341
and the estimates and std devs of the parameters mu and tau were
index name value std dev
3 mu 6.6000e+00 7.7318e-01
4 tau 2.7173e+00 7.8944e-01
where tau is the variance divided by the mean.
This was all so simple that I suspect your (rather difficult to read)
R code is wrong, otherwise R must really suck at this kind of problem.
Dave
About the numerical calculation of the Hessian matrix, I have found numDeriv:::hessian to be often more accurate than the Hessian returned by optim. Best, Giovanni Petris On Sat, 2011-09-03 at 13:39 -0400, John C Nash wrote:> Unless you are supplying analytic hessian code, you are almost certainly getting an > approximation. Worse, if you do not provide gradients, these are the result of two levels > of differencing, so you should expect some loss of precision in the approximate Hessian. > > Moreover, if your estimate of the optimum is a little bit off, or the optimizer has > terminated (algorithms converge, programs terminate) to a point that is not an optimum, > there is no reason the Hessian should be positive definite. > > Package optimx() uses the Jacobian of the gradient if the analytic gradient is available. > This drops the differencing to 1 level. Even better is to code the Hessian, but that is > messy and tedious in most cases. > > Best, JN > > > On 09/03/2011 06:00 AM, r-help-request at r-project.org wrote: > > Message: 59 > > Date: Fri, 2 Sep 2011 15:33:13 -0400 > > From: tzaihra at alcor.concordia.ca > > To: r-help at r-project.org > > Subject: [R] Hessian Matrix Issue > > Message-ID: > > <e6dc43b4487eb4a4055e1ab485f015f0.squirrel at webmail.concordia.ca> > > Content-Type: text/plain;charset=iso-8859-1 > > > > Dear All, > > > > I am running a simulation to obtain coverage probability of Wald type > > confidence intervals for my parameter d in a function of two parameters > > (mu,d). > > > > I am optimizing it using "optim" method "L-BFGS-B" to obtain MLE. As, I > > want to invert the Hessian matrix to get Standard errors of the two > > parameter estimates. However, my Hessian matrix at times becomes > > non-invertible that is it is no more positive definite and I get the > > following error msg: > > > > "Error in solve.default(ac$hessian) : system is computationally singular: > > reciprocal condition number = 6.89585e-21" > > Thank you > > > > Following is the code I am running I would really appreciate your comments > > and suggestions: > > > > #Start Code > > #option to trace /recover error > > #options(error = recover) > > > > #Sample Size > > n<-30 > > mu<-5 > > size<- 2 > > > > #true values of parameter d > > d.true<-1+mu/size > > d.true > > > > #true value of zero inflation index phi= 1+log(d)/(1-d) > > z.true<-1+(log(d.true)/(1-d.true)) > > z.true > > > > # Allocating space for simulation vectors and setting counters for simulation > > counter<-0 > > iter<-10000 > > lower.d<-numeric(iter) > > upper.d<-numeric(iter) > > > > #set.seed(987654321) > > > > #begining of simulation loop######## > > > > for (i in 1:iter){ > > r.NB<-rnbinom(n, mu = mu, size = size) > > y<-sort(r.NB) > > iter.num<-i > > print(y) > > print(iter.num) > > #empirical estimates or sample moments > > xbar<-mean(y) > > variance<-(sum((y-xbar)2))/length(y) > > dbar<-variance/xbar > > #sample estimate of proportion of zeros and zero inflation index > > pbar<-length(y[y==0])/length(y) > > > > ### Simplified function ############################################# > > > > NegBin<-function(th){ > > mu<-th[1] > > d<-th[2] > > n<-length(y) > > > > arg1<-n*mean(y)*ifelse(mu >= 0, log(mu),0) > > #arg1<-n*mean(y)*log(mu) > > > > #arg2<-n*log(d)*((mean(y))+mu/(d-1)) > > arg2<-n*ifelse(d>=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)>= 0, (d-1), > > 0.0000001)) > > > > aa<-numeric(length(max(y))) > > a<-numeric(length(y)) > > for (i in 1:n) > > { > > for (j in 1:y[i]){ > > aa[j]<-ifelse(((j-1)*(d-1))/mu >0,log(1+((j-1)*(d-1))/mu),0) > > #aa[j]<-log(1+((j-1)*(d-1))/mu) > > #print(aa[j]) > > } > > > > a[i]<-sum(aa) > > #print(a[i]) > > } > > a > > arg3<-sum(a) > > llh<-arg1+arg2+arg3 > > if(! is.finite(llh)) > > llh<-1e+20 > > -llh > > } > > ac<-optim(NegBin,par=c(xbar,dbar),method="L-BFGS-B",hessian=TRUE,lower> > c(0,1) ) > > ac > > print(ac$hessian) > > muhat<-ac$par[1] > > dhat<-ac$par[2] > > zhat<- 1+(log(dhat)/(1-dhat)) > > infor<-solve(ac$hessian) > > var.dhat<-infor[2,2] > > se.dhat<-sqrt(var.dhat) > > var.muhat<-infor[1,1] > > se.muhat<-sqrt(var.muhat) > > var.func<-dhat*muhat > > var.func > > d.prime<-cbind(dhat,muhat) > > > > se.var.func<-d.prime%*%infor%*%t(d.prime) > > se.var.func > > lower.d[i]<-dhat-1.96*se.dhat > > upper.d[i]<-dhat+1.96*se.dhat > > > > if(lower.d[i] <= d.true & d.true<= upper.d[i]) > > counter <-counter+1 > > } > > counter > > covg.prob<-counter/iter > > covg.prob > > > > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Yes, numDeriv::hessian is very accurate. So, I normally take the output from
the optimizer, *if it is a local optimum*, and then apply numDeriv::hessian to
it. I, then, compute the standard errors from it.
However, it is important to know that you have obtained a local optimum. In
fact, you need the gradient and Hessian to actually verify this - the first and
second order KKT conditions. However, this is tricky in *constrained*
optimization problems. If you have constraints, and at least one of the
constraints is active at the best parameter estimate, then the gradient of the
original objective function need not be close to zero and the hessian is also
incorrect.
If you employ an augmented Lagrangian approach (see alabama::auglag), then you
can obtain the correct gradient and Hessian at best parameter estimate. These
gradients and Hessian correspond to the modified objective function that
includes a Lagrangian term augmented by a quadratic penalty term. They can be
used to check the KKT conditions.
Here is a simple example:
require(alabama)
fr <- function(x) { ## Rosenbrock Banana function
x1 <- x[1]
x2 <- x[2]
100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
grr <- function(x) { ## Gradient of 'fr'
x1 <- x[1]
x2 <- x[2]
c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
200 * (x2 - x1 * x1))
}
p0 <- c(0, 0)
ans1 <- optim(par=p0, fn=fr, gr=grr, upper=c(0.9, Inf),
method="L-BFGS-B", hessian=TRUE)
grr(ans1$par) # this will not be close to zero
ans1$hessian
# Using an augmented Lagrangian optimizer
hcon <- function(x) 0.9 - x[1]
hcon.jac <- function(x) matrix(c(-1, 0) , 1, 2)
ans2 <- auglag(par=p0, fn=fr, gr=grr, hin=hcon, hin.jac=hcon.jac)
ans2$gradient # this will be close to zero
ans2$hessian
However, it is not known whether the standard errors obtained from this Hessian
are asymptotically valid.
Best,
Ravi.
-------------------------------------------------------
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins
University
Ph. (410) 502-2619
email: rvaradhan@jhmi.edu<mailto:rvaradhan@jhmi.edu>
[[alternative HTML version deleted]]