Dear R helpers, I need to estimate a probit model with box constraints placed on several of the model parameters. I have the following two questions: 1) How are the standard errors calclulated in glm (family=binomial(link="probit")? I ran a typical probit model using the glm probit link and the nlminb function with my own coding of the loglikehood, separately. As nlminb does not produce the hessian matrix, I used hessian (numDeriv) to calculate it. However, the standard errors calculated using hessian function are quite different from the ones generated by the glm function, although the parameter estimates are very close. I was wondering what makes this difference in the estmation of standard errors and how this computation is carried out in glm. 2) Does any one know how to estimate a constrained probit model in R (to be specific, I need to restrain the range of three parameters to [-1,1])? Among the optimation functions, so far nlminb and spg work for my problem, but neither produces a hessian matrix. As I mentioned above, if I use hessian funciton and calculate standard errors manually, the standard errors seem not right. Many thanks in advance for your kind help. Maomao [[alternative HTML version deleted]]

Sally Luo <shali623 <at> gmail.com> writes:> > Dear R helpers, > > I need to estimate a probit model with box constraints placed on several of > the model parameters. I have the following two questions: > > 1) How are the standard errors calclulated in glm > (family=binomial(link="probit")? I ran a typical probit model using the > glm probit link and the nlminb function with my own coding of the > loglikehood, separately. As nlminb does not produce the hessian matrix, I > used hessian (numDeriv) to calculate it. However, the standard errors > calculated using hessian function are quite different from the ones > generated by the glm function, although the parameter estimates are very > close. I was wondering what makes this difference in the estmation of > standard errors and how this computation is carried out in glm. > > 2) Does any one know how to estimate a constrained probit model in R (to be > specific, I need to restrain the range of three parameters to [-1,1])? > Among the optimation functions, so far nlminb and spg work for my problem, > but neither produces a hessian matrix. As I mentioned above, if I use > hessian funciton and calculate standard errors manually, the standard > errors seem not right. >I'm a little biased, but I think the bbmle package is the easiest way to get this done -- it provides convenient wrappers for a range of optimizers including nlminb. I would warn however that you should be very careful interpreting the meaning of the Hessian matrix if some of your parameters lie on the boundary of the feasible space ... set.seed(101) x <- runif(100) p <- pnorm(1+3*x) y <- rbinom(100,p,size=1) d <- data.frame(x,y) glmfit <- glm(y~x,family=binomial(link="probit"),data=d) coef(summary(glmfit)) library(bbmle) mle2fit <- mle2(y~dbinom(pnorm(pprobit),size=1), parameters=list(pprobit~x), start=list(pprobit=0), data=d) coef(summary(mle2fit)) ## now add constraints mle2fit_constr <- mle2(y~dbinom(pnorm(pprobit),size=1), parameters=list(pprobit~x), start=list(pprobit=0), optimizer="nlminb", lower=c(2,0.15), data=d) coef(summary(mle2fit_constr))

[cc'ing back to r-help] On 12-02-02 01:56 PM, Sally Luo wrote:> I tried to adapt your code to my model and got the results as below. I > don't know how to fix the warning messages. It says "rearrange the lower > (or upper) bounds to match 'start'".The warning is overly conservative in this case. I should work on engineering the package so that it handles this better. You can disregard them. In answer to your previous questions: * "size" refers to the number of trials per observation (1, if you have binary data) * you've got the form of the lower and upper bounds right. * you've got the formula in 'parameters' right -- this builds a linear model (using R's model.matrix) on the probit scale based on the 8 parameters> > And two of the estimates for my restricted parameters are on the boundary. > The warning message says the variance-covariance calculations may be > unreliable. Those parameters are the ones of interest to my study. Can I > still make inferences using the p-values reported by mle2 in this case?That's quite tricky unfortunately, and it isn't a problem that's specific to the mle2 package. The basic issue is that the whole derivation of the multivariate normal sampling distribution of the maximum likelihood estimator depends on the maximum likelihood being an interior local maximum (and hence having a negative-definite hessian, or a positive-definite information matrix), which is untrue on the boundary -- the Wikipedia article on maximum likelihood mentions this issue, for example http://en.wikipedia.org/wiki/Maximum_likelihood Perhaps someone here can suggest an approach (although it gets outside the scope of "R help", or you can ask on http://stats.stackexchange.com ...> > Thanks for your help. ---- Sally > > > mle2fit <- mle2(y~dbinom(pnorm(pprobit),size=1), > + parameters=list(pprobit~x1+x2+x3+x4+x5+x6+x7+x8), > + start=list(pprobit=0), > + optimizer="nlminb", > + lower=c(-Inf,-1,-1,-1,-Inf,-Inf,-Inf,-Inf,-Inf), > + upper=c(Inf,1,1,1,Inf,Inf,Inf,Inf,Inf), > + data=d) > > Warning messages: > 1: In fix_order(call$lower, "lower bounds", -Inf) : > lower bounds not named: rearranging to match 'start' > 2: In fix_order(call$upper, "upper bounds", Inf) : > upper bounds not named: rearranging to match 'start' > 3: In mle2(y ~ dbinom(pnorm(pprobit), size = 1), parameters = list(pprobit > ~ : > some parameters are on the boundary: variance-covariance calculations > based on Hessian may be unreliable> > > On Wed, Feb 1, 2012 at 11:16 PM, Sally Luo <shali623 at gmail.com> wrote: > >> Prof. Bolker, >> >> Thanks a lot for your reply. >> >> In my model, I have 9 explanatory variables and I need to restrict the >> range of parameters 2-4 to (-1,1). I tried to modify the univariate probit >> example you gave in your reply, however, I could not get through. >> >> Specificially, I am not sure what 'pprobit' represents in your code? How >> should I code this part if I have more than one variable? >> >> Also does "size" refer to the number of parameters? >> >> Since only 3 parameters need to be restricted in my model, should I write >> lower=c(-Inf, -1,-1,-1, -Inf, -Inf, -Inf, -Inf, -Inf) and upper=c(Inf, >> 1,1,1, Inf, Inf, Inf, Inf, Inf)? >> >> Thanks again for your kind help. >> >> Best, >> >> Sally >> >> >> >> On Wed, Feb 1, 2012 at 7:19 AM, Ben Bolker <bbolker at gmail.com> wrote: >> >>> Sally Luo <shali623 <at> gmail.com> writes: >>> >>>> >>>> Dear R helpers, >>>> >>>> I need to estimate a probit model with box constraints placed on >>> several of >>>> the model parameters. I have the following two questions: >>>> >>>> 1) How are the standard errors calclulated in glm >>>> (family=binomial(link="probit")? I ran a typical probit model using the >>>> glm probit link and the nlminb function with my own coding of the >>>> loglikehood, separately. As nlminb does not produce the hessian matrix, >>> I >>>> used hessian (numDeriv) to calculate it. However, the standard errors >>>> calculated using hessian function are quite different from the ones >>>> generated by the glm function, although the parameter estimates are very >>>> close. I was wondering what makes this difference in the estmation of >>>> standard errors and how this computation is carried out in glm. >>>> >>>> 2) Does any one know how to estimate a constrained probit model in R >>> (to be >>>> specific, I need to restrain the range of three parameters to [-1,1])? >>>> Among the optimation functions, so far nlminb and spg work for my >>> problem, >>>> but neither produces a hessian matrix. As I mentioned above, if I use >>>> hessian funciton and calculate standard errors manually, the standard >>>> errors seem not right. >>>> >>> >>> I'm a little biased, but I think the bbmle package is the >>> easiest way to get this done -- it provides convenient wrappers >>> for a range of optimizers including nlminb. >>> I would warn however that you should be very careful interpreting >>> the meaning of the Hessian matrix if some of your parameters lie >>> on the boundary of the feasible space ... >>> >>> set.seed(101) >>> x <- runif(100) >>> p <- pnorm(1+3*x) >>> y <- rbinom(100,p,size=1) >>> d <- data.frame(x,y) >>> glmfit <- glm(y~x,family=binomial(link="probit"),data=d) >>> coef(summary(glmfit)) >>> >>> library(bbmle) >>> mle2fit <- mle2(y~dbinom(pnorm(pprobit),size=1), >>> parameters=list(pprobit~x), >>> start=list(pprobit=0), >>> data=d) >>> coef(summary(mle2fit)) >>> >>> ## now add constraints >>> mle2fit_constr <- mle2(y~dbinom(pnorm(pprobit),size=1), >>> parameters=list(pprobit~x), >>> start=list(pprobit=0), >>> optimizer="nlminb", >>> lower=c(2,0.15), >>> data=d) >>> >>> coef(summary(mle2fit_constr)) >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >

[cc'ing back to r-help again -- I do this so the answers can be archived and viewed by others] On 12-02-02 02:41 PM, Sally Luo wrote:> Prof. Bolker, > > Thanks for your quick reply and detailed explanation. > > I also ran the unrestricted model using glmfit <- > glm(y~x1+x2+x3+x4+x5+x6+x7+x8, family=binomial(link="probit"),data=d). > However, the results I got from glm and mle2 (both for the unrestricted > model) are not very similar (please see below). In your earlier example, > both glm and mle2 produce almost the same estimation results. I just hope > to figure out what might cause the discrepancy in the estimation results > I've got. > > > coef(summary(*glmfit*)) > Estimate Std. Error z value Pr(>|z|) > (Intercept) -0.853900059 0.2464179864 -3.4652505 5.297377e-04 > x1 1.627125691 0.3076174699 5.2894450 1.226881e-07 > x2 -0.092716326 0.5229866504 -0.1772824 8.592866e-01 > x3 -3.301509522 0.9169991843 -3.6003407 3.178004e-04 > x4 7.187483436 2.2135961171 3.2469715 1.166401e-03 > x5 -0.002544181 0.0112740324 -0.2256673 8.214602e-01 > x6 6.978374268 2.2347939216 3.1226030 1.792594e-03 > x7 -0.009832379 0.0113807583 -0.8639476 3.876167e-01 > x8 -0.001252075 0.0002304789 -5.4324941 5.557178e-08 > >> coef(summary(*mle2fit*)) > Estimate Std. Error z value Pr(z) > pprobit.(Intercept) -0.603492668 0.230117071 -2.6225463 8.727541e-03 > pprobit.x1 1.645984346 0.288479906 5.7057158 1.158552e-08 > pprobit.x2 -0.157361533 0.523048376 -0.3008546 7.635253e-01 > pprobit.x3 -3.935203692 0.932692587 -4.2191862 2.451857e-05 > pprobit.x4 7.512701611 0.062911076 119.4177885 0.000000e+00 > pprobit.x5 -0.001475556 0.011525137 -0.1280293 8.981258e-01 > pprobit.x6 7.399355063 0.018372749 402.7353318 0.000000e+00 > pprobit.x7 -0.010113008 0.011647725 -0.8682388 3.852636e-01 > pprobit.x8 -0.001650021 0.000244997 -6.7348622 1.640854e-11My best guess is that you are running into optimization problems. The big advantage of glm() is that it uses a special-purpose optimization method (iteratively reweighted least squares) that is generally much more robust/reliable than general-purpose nonlinear optimizers such as nlminb. If there is indeed a GLM fitting routine coded in R, somewhere, that someone has adapted to work with box constraints, it will probably perform better than mle2. Some general suggestions for troubleshooting this: * check the log-likelihoods returned by the two methods. If they are very close (say within 0.01 likelihood units), then the issue is that you just have a very flat goodness-of-fit surface, and the two sets of coefficients are in practice very similar to each other. * if possible, try starting each approach (glm(), mle2()) from the solution found by the other (it's a little bit of a pain to get the syntax right here) and see if they get stuck right where they are or whether they find that one answer or the other is right. * if you were using one of the optimizing methods from optim() (rather than nlminb), e.g. L-BFGS-B, I would suggest you try using parscale to rescale the parameters to have approximately equal magnitudes near the solution. This apparently isn't possible with nlminb, but you could try optimizer="optim" (the default), method="L-BFGS-B" and see how you do (although L-BFGS-B is often a bit finicky). Alternatively, you can try optimizer="optimx", in which case you have a larger variety of unconstrained optimizers to choose from (you have to install the optimx package and take a look at its documentation). Alternatively, you can scale your input variables (e.g. use scale() on your input matrix to get zero-centered, sd 1 variables), although you would then have to adjust your lower and upper bounds accordingly. * it's a bit more work, but you may be able to unpack this a bit and provide analytical derivatives. That would help a lot. In short: you are entering the quagmire of numerical optimization methods. I have learned most of this stuff by trial and error -- can anyone on the list suggest a good/friendly introduction? (Press et al Numerical Recipes; Givens and Hoeting's Computational Statistics book looks good, although I haven't read it ...) Ben Bolker> > > > On Thu, Feb 2, 2012 at 1:12 PM, Ben Bolker <bbolker at gmail.com> wrote: > >> >> [cc'ing back to r-help] >> >> On 12-02-02 01:56 PM, Sally Luo wrote: >> >>> I tried to adapt your code to my model and got the results as below. I >>> don't know how to fix the warning messages. It says "rearrange the lower >>> (or upper) bounds to match 'start'". >> >> The warning is overly conservative in this case. I should work on >> engineering the package so that it handles this better. You can >> disregard them. >> >> In answer to your previous questions: >> >> * "size" refers to the number of trials per observation (1, if you have >> binary data) >> * you've got the form of the lower and upper bounds right. >> * you've got the formula in 'parameters' right -- this builds a linear >> model (using R's model.matrix) on the probit scale based on the 8 >> parameters >>> >>> And two of the estimates for my restricted parameters are on the >> boundary. >>> The warning message says the variance-covariance calculations may be >>> unreliable. Those parameters are the ones of interest to my study. Can >> I >>> still make inferences using the p-values reported by mle2 in this case? >> >> That's quite tricky unfortunately, and it isn't a problem that's >> specific to the mle2 package. The basic issue is that the whole >> derivation of the multivariate normal sampling distribution of the >> maximum likelihood estimator depends on the maximum likelihood being an >> interior local maximum (and hence having a negative-definite hessian, or >> a positive-definite information matrix), which is untrue on the boundary >> -- the Wikipedia article on maximum likelihood mentions this issue, for >> example http://en.wikipedia.org/wiki/Maximum_likelihood >> >> Perhaps someone here can suggest an approach (although it gets outside >> the scope of "R help", or you can ask on http://stats.stackexchange.com... >> >> >>> >>> Thanks for your help. ---- Sally >>> >>> >>> mle2fit <- mle2(y~dbinom(pnorm(pprobit),size=1), >>> + parameters=list(pprobit~x1+x2+x3+x4+x5+x6+x7+x8), >>> + start=list(pprobit=0), >>> + optimizer="nlminb", >>> + lower=c(-Inf,-1,-1,-1,-Inf,-Inf,-Inf,-Inf,-Inf), >>> + upper=c(Inf,1,1,1,Inf,Inf,Inf,Inf,Inf), >>> + data=d) >>> >>> Warning messages: >>> 1: In fix_order(call$lower, "lower bounds", -Inf) : >>> lower bounds not named: rearranging to match 'start' >>> 2: In fix_order(call$upper, "upper bounds", Inf) : >>> upper bounds not named: rearranging to match 'start' >>> 3: In mle2(y ~ dbinom(pnorm(pprobit), size = 1), parameters >> list(pprobit >>> ~ : >>> some parameters are on the boundary: variance-covariance calculations >>> based on Hessian may be unreliable >> >>> >>> >>> On Wed, Feb 1, 2012 at 11:16 PM, Sally Luo <shali623 at gmail.com> wrote: >>> >>>> Prof. Bolker, >>>> >>>> Thanks a lot for your reply. >>>> >>>> In my model, I have 9 explanatory variables and I need to restrict the >>>> range of parameters 2-4 to (-1,1). I tried to modify the univariate >> probit >>>> example you gave in your reply, however, I could not get through. >>>> >>>> Specificially, I am not sure what 'pprobit' represents in your code? How >>>> should I code this part if I have more than one variable? >>>> >>>> Also does "size" refer to the number of parameters? >>>> >>>> Since only 3 parameters need to be restricted in my model, should I >> write >>>> lower=c(-Inf, -1,-1,-1, -Inf, -Inf, -Inf, -Inf, -Inf) and upper=c(Inf, >>>> 1,1,1, Inf, Inf, Inf, Inf, Inf)? >>>> >>>> Thanks again for your kind help. >>>> >>>> Best, >>>> >>>> Sally >>>> >>>> >>>> >>>> On Wed, Feb 1, 2012 at 7:19 AM, Ben Bolker <bbolker at gmail.com> wrote: >>>> >>>>> Sally Luo <shali623 <at> gmail.com> writes: >>>>> >>>>>> >>>>>> Dear R helpers, >>>>>> >>>>>> I need to estimate a probit model with box constraints placed on >>>>> several of >>>>>> the model parameters. I have the following two questions: >>>>>> >>>>>> 1) How are the standard errors calclulated in glm >>>>>> (family=binomial(link="probit")? I ran a typical probit model using >> the >>>>>> glm probit link and the nlminb function with my own coding of the >>>>>> loglikehood, separately. As nlminb does not produce the hessian >> matrix, >>>>> I >>>>>> used hessian (numDeriv) to calculate it. However, the standard errors >>>>>> calculated using hessian function are quite different from the ones >>>>>> generated by the glm function, although the parameter estimates are >> very >>>>>> close. I was wondering what makes this difference in the estmation of >>>>>> standard errors and how this computation is carried out in glm. >>>>>> >>>>>> 2) Does any one know how to estimate a constrained probit model in R >>>>> (to be >>>>>> specific, I need to restrain the range of three parameters to [-1,1])? >>>>>> Among the optimation functions, so far nlminb and spg work for my >>>>> problem, >>>>>> but neither produces a hessian matrix. As I mentioned above, if I use >>>>>> hessian funciton and calculate standard errors manually, the standard >>>>>> errors seem not right. >>>>>> >>>>> >>>>> I'm a little biased, but I think the bbmle package is the >>>>> easiest way to get this done -- it provides convenient wrappers >>>>> for a range of optimizers including nlminb. >>>>> I would warn however that you should be very careful interpreting >>>>> the meaning of the Hessian matrix if some of your parameters lie >>>>> on the boundary of the feasible space ... >>>>> >>>>> set.seed(101) >>>>> x <- runif(100) >>>>> p <- pnorm(1+3*x) >>>>> y <- rbinom(100,p,size=1) >>>>> d <- data.frame(x,y) >>>>> glmfit <- glm(y~x,family=binomial(link="probit"),data=d) >>>>> coef(summary(glmfit)) >>>>> >>>>> library(bbmle) >>>>> mle2fit <- mle2(y~dbinom(pnorm(pprobit),size=1), >>>>> parameters=list(pprobit~x), >>>>> start=list(pprobit=0), >>>>> data=d) >>>>> coef(summary(mle2fit)) >>>>> >>>>> ## now add constraints >>>>> mle2fit_constr <- mle2(y~dbinom(pnorm(pprobit),size=1), >>>>> parameters=list(pprobit~x), >>>>> start=list(pprobit=0), >>>>> optimizer="nlminb", >>>>> lower=c(2,0.15), >>>>> data=d) >>>>> >>>>> coef(summary(mle2fit_constr)) >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> <http://www.r-project.org/posting-guide.html> >> >>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>> >> >> >