Steven LeBlanc
2013-Oct-20 22:01 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
Greets,
I'm trying to use nlminb() to estimate the parameters of a bivariate normal
sample and during one of the iterations it passes a parameter vector to the
likelihood function resulting in an invalid covariance matrix that causes
dmvnorm() to throw an error. Thus, it seems I need to somehow communicate to
nlminb() that the final three parameters in my parameter vector are used to
construct a positive semi-definite matrix, but I can't see how to achieve
this using the constraint mechanism provided. Additional details are provided in
the code below.
Suggestions?
Best Regards,
Steven
Generate the data set I'm using:
mu<-c(1,5)
sigma<-c(1,2,2,6)
dim(sigma)<-c(2,2)
set.seed(83165026)
sample.full<-sample.deleted<-mvrnorm(50,mu,sigma)
delete.one<-c(1,5,13,20)
delete.two<-c(3,11,17,31,40,41)
sample.deleted[delete.one,1]<-NA
sample.deleted[delete.two,2]<-NA
missing<-c(delete.one,delete.two)
complete<-sample.deleted[!(is.na(sample.deleted[,1]) |
is.na(sample.deleted[,2])),]
deleted<-sample.deleted[missing,]
Try to estimate the parameters of the data set less the deleted values:
exact<-function(theta,complete,deleted){
one.only<-deleted[!(is.na(deleted[,1])),1]
two.only<-deleted[!(is.na(deleted[,2])),2]
u<-c(theta[1],theta[2])
sigma<-c(theta[3],theta[5],theta[5],theta[4])
dim(sigma)<-c(2,2)
-sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))-
sum(log(dnorm(one.only,u[1],sigma[1,1])))-
sum(log(dnorm(two.only,u[2],sigma[2,2])))
}
nlminb(start=c(0,0,1,1,0),objective=exact,complete=complete,deleted=deleted,control=list(trace=1))
Escape and it stops at Iteration 9 on my machine.
David Winsemius
2013-Oct-21 00:19 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
On Oct 20, 2013, at 3:01 PM, Steven LeBlanc wrote:> Greets, > > I'm trying to use nlminb() to estimate the parameters of a bivariate normal sample and during one of the iterations it passes a parameter vector to the likelihood function resulting in an invalid covariance matrix that causes dmvnorm() to throw an error. Thus, it seems I need to somehow communicate to nlminb() that the final three parameters in my parameter vector are used to construct a positive semi-definite matrix, but I can't see how to achieve this using the constraint mechanism provided. Additional details are provided in the code below. >I recently noticed that teh dmvnorn function in package mixtools doe not throw an error when given a non-positive definite varaince-covariance matrix. Whether it should be doing so might be up for debate. Peter Dalgaard seemed to think that any self respecting such function _should_ be throwing NaN's back at you. The code it was using is rather compact:> mixtools::dmvnormfunction (y, mu = NULL, sigma = NULL) { exp(logdmvnorm(y, mu = mu, sigma = sigma)) } <environment: namespace:mixtools>> mixtools::logdmvnormfunction (y, mu = NULL, sigma = NULL) { if (is.vector(y)) y <- matrix(y, nrow = 1) d <- ncol(y) if (!is.null(mu)) y <- sweep(y, 2, mu, "-") if (is.null(sigma)) sigma <- diag(d) k <- d * 1.83787706640935 a <- qr(sigma) logdet <- sum(log(abs(diag(a$qr)))) if (nrow(y) == 1) mahaldist <- as.vector(y %*% qr.solve(a, t(y))) else mahaldist <- rowSums((y %*% qr.solve(a)) * y) -0.5 * (mahaldist + logdet + k) } I noticed that you were logging the dmvnorm values and so I wondered also if going directly to that function might save some time. (Note that I am way out of my mathematical dept here. Caveat emptor, maxima caveat emptor). -- David.> Suggestions? > > Best Regards, > Steven > > Generate the data set I'm using: > > mu<-c(1,5) > sigma<-c(1,2,2,6) > dim(sigma)<-c(2,2) > set.seed(83165026) > sample.full<-sample.deleted<-mvrnorm(50,mu,sigma) > delete.one<-c(1,5,13,20) > delete.two<-c(3,11,17,31,40,41) > sample.deleted[delete.one,1]<-NA > sample.deleted[delete.two,2]<-NA > missing<-c(delete.one,delete.two) > complete<-sample.deleted[!(is.na(sample.deleted[,1]) | is.na(sample.deleted[,2])),] > deleted<-sample.deleted[missing,] > > Try to estimate the parameters of the data set less the deleted values: > > exact<-function(theta,complete,deleted){ > one.only<-deleted[!(is.na(deleted[,1])),1] > two.only<-deleted[!(is.na(deleted[,2])),2] > u<-c(theta[1],theta[2]) > sigma<-c(theta[3],theta[5],theta[5],theta[4]) > dim(sigma)<-c(2,2) > -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- > sum(log(dnorm(one.only,u[1],sigma[1,1])))- > sum(log(dnorm(two.only,u[2],sigma[2,2]))) > } > nlminb(start=c(0,0,1,1,0),objective=exact,complete=complete,deleted=deleted,control=list(trace=1)) > > Escape and it stops at Iteration 9 on my machine. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
William Dunlap
2013-Oct-21 01:41 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
Do you mean that your objective function (given to nlminb) parameterized a positive definite matrix, P, as the elements in its upper (or lower) triangle? If so, you could reparameterize it by the non-zero (upper triangular) elements of the Choleski decomposition, C, of P. Compute P as crossprod(C), compute the initial estimate of C as chol(P). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Steven LeBlanc > Sent: Sunday, October 20, 2013 3:02 PM > To: r-help at R-project.org > Subject: [R] nlminb() - how do I constrain the parameter vector properly? > > Greets, > > I'm trying to use nlminb() to estimate the parameters of a bivariate normal sample and > during one of the iterations it passes a parameter vector to the likelihood function > resulting in an invalid covariance matrix that causes dmvnorm() to throw an error. Thus, > it seems I need to somehow communicate to nlminb() that the final three parameters in > my parameter vector are used to construct a positive semi-definite matrix, but I can't see > how to achieve this using the constraint mechanism provided. Additional details are > provided in the code below. > > Suggestions? > > Best Regards, > Steven > > Generate the data set I'm using: > > mu<-c(1,5) > sigma<-c(1,2,2,6) > dim(sigma)<-c(2,2) > set.seed(83165026) > sample.full<-sample.deleted<-mvrnorm(50,mu,sigma) > delete.one<-c(1,5,13,20) > delete.two<-c(3,11,17,31,40,41) > sample.deleted[delete.one,1]<-NA > sample.deleted[delete.two,2]<-NA > missing<-c(delete.one,delete.two) > complete<-sample.deleted[!(is.na(sample.deleted[,1]) | is.na(sample.deleted[,2])),] > deleted<-sample.deleted[missing,] > > Try to estimate the parameters of the data set less the deleted values: > > exact<-function(theta,complete,deleted){ > one.only<-deleted[!(is.na(deleted[,1])),1] > two.only<-deleted[!(is.na(deleted[,2])),2] > u<-c(theta[1],theta[2]) > sigma<-c(theta[3],theta[5],theta[5],theta[4]) > dim(sigma)<-c(2,2) > -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- > sum(log(dnorm(one.only,u[1],sigma[1,1])))- > sum(log(dnorm(two.only,u[2],sigma[2,2]))) > } > nlminb(start=c(0,0,1,1,0),objective=exact,complete=complete,deleted=deleted,control=l > ist(trace=1)) > > Escape and it stops at Iteration 9 on my machine. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Prof J C Nash (U30A)
2013-Oct-21 12:47 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
This is one area where more internal communication between the objective function (inadmissible inputs) and optimizer (e.g., Quasi-Newton) is needed. This is NOT done at the moment in R, nor in most software. An area for R&D. In Nash and Walker-Smith (1987) we did some of this in BASIC back in mid-80s. Still trying to redo that for R, but it won't be done quickly due to the much bigger infrastructure of R. The "trick" with using a large value or Inf (which sometimes causes other errors) usually slows the optimization, whereas communicating that the objective is inadmissible in a line search can often be simply a shortening of the step size. JN On 13-10-21 06:00 AM, r-help-request at r-project.org wrote:> Message: 34 > Date: Mon, 21 Oct 2013 05:56:45 -0400 > From: Mark Leeds<markleeds2 at gmail.com> > To: Steven LeBlanc<oreslag at gmail.com> > Cc:"r-help at R-project.org" <r-help at r-project.org> > Subject: Re: [R] nlminb() - how do I constrain the parameter vector > properly? > Message-ID: > <CAHz+bWYEtvZjiCCauGVxstGcqmF6ENw0N3MMb7jfa3OkYkBTKA at mail.gmail.com> > Content-Type: text/plain > > my mistake. since nlminb is minimizing, it should be +Inf ( so that the > likelihood > is large ) as you pointed out. Note that this approach is a heuristic and > may not work all the time.
William Dunlap
2013-Oct-21 14:52 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
Try defining the function
theta345toSigma <- function(theta) {
cholSigma <- cbind(c(theta[3], 0), c(theta[4], theta[5]))
crossprod(cholSigma) # t(cholSigma) %*% cholSigma)
}
This creates a positive definite matrix for any theta (and
any positive definite matrix has such a representation, generally
more than one). It is like using the square root of a quantity
in the optimizer when you know the quantity must be non-negative.
Then change your
sigma <- c(theta[3],theta[5],theta[5],theta[4])
dim(sigma) <- c(2, 2)
to
sigma <- theta345toSigma(theta)
If one of your variances is near 0 the optimizer may run into
trouble at saddlepoints. Others may be able to help better
with that issue.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: Steven LeBlanc [mailto:oreslag at gmail.com]
> Sent: Sunday, October 20, 2013 9:35 PM
> To: William Dunlap
> Subject: Re: [R] nlminb() - how do I constrain the parameter vector
properly?
>
>
> On Oct 20, 2013, at 6:41 PM, William Dunlap <wdunlap at tibco.com>
wrote:
>
> > Do you mean that your objective function (given to nlminb)
parameterized
> > a positive definite matrix, P, as the elements in its upper (or lower)
triangle?
> > If so, you could reparameterize it by the non-zero (upper triangular)
elements
> > of the Choleski decomposition, C, of P. Compute P as crossprod(C),
compute
> > the initial estimate of C as chol(P).
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
>
> Hi Bill,
>
> I've clipped out the superfluous code to leave the objective function
and call to nlminb()
> below.
>
> I'm using the first two parameters to construct the vector of means
'u' for a bivariate
> normal, and the final three parameters to construct the corresponding
covariance matrix
> 'sigma'. Both are required by dmvnorm(). In summary, nlminb()
required a vector of
> parameters, so I supplied the number of parameters I needed nlminb() to
optimize and
> simply built the required formats within the function.
>
> It didn't occur to me until I saw the error that nlminb() would have no
way of knowing the
> proper boundaries of the parameter space unless there is some way to
communicate the
> constraints. nlminb() implements simple box constraints, but I don't
see a way to
> communicate "parameters 3, 4, and 5 must satisfy 3*4 - 5^2 > 0.
>
> Regarding your suggestion, I don't think I understand. Might you
elaborate?
>
> Thanks & Best Regards,
> Steven
>
> > exact<-function(theta,complete,deleted){
> >>
> >> u<-c(theta[1],theta[2])
> >> sigma<-c(theta[3],theta[5],theta[5],theta[4])
> >> dim(sigma)<-c(2,2)
> >> -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))-
> >> sum(log(dnorm(one.only,u[1],sigma[1,1])))-
> >> sum(log(dnorm(two.only,u[2],sigma[2,2])))
> >> }
> >>
>
nlminb(start=c(0,0,1,1,0),objective=exact,complete=complete,deleted=deleted,control=l
> >> ist(trace=1))
William Dunlap
2013-Oct-21 16:28 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
[I added r-help to the cc again. Please keep the replies on the list as there are others who can contribute to or learn from them.]> I also learned you have to be very careful with the starting value, as the simple identity > matrix becomes singular under the transformation.That is why I suggested using the non-zero entries in chol(sigmaStart), where sigmaStart is your initial estimate of the variance matrix, as the initial values of theta[3:5]. I should have said that crossProd(x) gives you a positive semidefinite matrix, not positive definite. When I said that variances near 0 would cause problems I meant that a singular covariance matrix would cause problems. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: Steven LeBlanc [mailto:oreslag at gmail.com] > Sent: Monday, October 21, 2013 9:21 AM > To: William Dunlap > Subject: Re: [R] nlminb() - how do I constrain the parameter vector properly? > > > On Oct 21, 2013, at 7:52 AM, William Dunlap <wdunlap at tibco.com> wrote: > > > Try defining the function > > theta345toSigma <- function(theta) { > > cholSigma <- cbind(c(theta[3], 0), c(theta[4], theta[5])) > > crossprod(cholSigma) # t(cholSigma) %*% cholSigma) > > } > > This creates a positive definite matrix for any theta (and > > any positive definite matrix has such a representation, generally > > more than one). It is like using the square root of a quantity > > in the optimizer when you know the quantity must be non-negative. > > > > Then change your > > sigma <- c(theta[3],theta[5],theta[5],theta[4]) > > dim(sigma) <- c(2, 2) > > to > > sigma <- theta345toSigma(theta) > > > > If one of your variances is near 0 the optimizer may run into > > trouble at saddlepoints. Others may be able to help better > > with that issue. > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > Hi Bill, > > I tried your suggestion and the optimizer produces a result, but it seems substantially far > from the anticipated result and from the result obtained when I use Inf as a return value > for an invalid covariance matrix. Perhaps it would work if I made other adjustments to > account for the 'bias' (using this term very loosely) induced by changing an invalid > parameter vector into something valid but incorrect? > > I also learned you have to be very careful with the starting value, as the simple identity > matrix becomes singular under the transformation. > > > theta<-c(0,0,1,1,0) > > cholSigma<-cbind(c(theta[3], 0), c(theta[4], theta[5])) > > sigma<-crossprod(cholSigma) > > sigma > [,1] [,2] > [1,] 1 1 > [2,] 1 1 > > In any case, the Inf trick works for now. I was asking about 'adjustments' strictly out of > curiosity. Code and results are below. > > Best Regards, > Steven > > > exact<-function(theta,complete,deleted){ > + one.only<-deleted[!(is.na(deleted[,1])),1] > + two.only<-deleted[!(is.na(deleted[,2])),2] > + u<-c(theta[1],theta[2]) > + sigma<-c(theta[3],theta[5],theta[5],theta[4]) > + dim(sigma)<-c(2,2) > + if(!(is.positive.semi.definite(sigma))){return(Inf)} > + -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- > + sum(log(dnorm(one.only,u[1],sigma[1,1])))- > + sum(log(dnorm(two.only,u[2],sigma[2,2]))) > + } > > exact2<-function(theta,complete,deleted){ > + one.only<-deleted[!(is.na(deleted[,1])),1] > + two.only<-deleted[!(is.na(deleted[,2])),2] > + u<-c(theta[1],theta[2]) > + cholSigma<-cbind(c(theta[3], 0), c(theta[4], theta[5])) > + sigma<-crossprod(cholSigma) > + -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- > + sum(log(dnorm(one.only,u[1],sigma[1,1])))- > + sum(log(dnorm(two.only,u[2],sigma[2,2]))) > + } > > nlminb(start=theta.hat.em,objective=exact,complete=complete,deleted=deleted)$par > [1] 1.2289422 5.4995271 0.9395155 4.8069068 1.8009213 > > nlminb(start=theta.hat.em,objective=exact2,complete=complete,deleted=deleted)$par > [1] 1.2289421 5.4995265 0.9692861 1.8579876 1.1639544