Steven LeBlanc
2013-Oct-20 22:01 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
Greets, I'm trying to use nlminb() to estimate the parameters of a bivariate normal sample and during one of the iterations it passes a parameter vector to the likelihood function resulting in an invalid covariance matrix that causes dmvnorm() to throw an error. Thus, it seems I need to somehow communicate to nlminb() that the final three parameters in my parameter vector are used to construct a positive semi-definite matrix, but I can't see how to achieve this using the constraint mechanism provided. Additional details are provided in the code below. Suggestions? Best Regards, Steven Generate the data set I'm using: mu<-c(1,5) sigma<-c(1,2,2,6) dim(sigma)<-c(2,2) set.seed(83165026) sample.full<-sample.deleted<-mvrnorm(50,mu,sigma) delete.one<-c(1,5,13,20) delete.two<-c(3,11,17,31,40,41) sample.deleted[delete.one,1]<-NA sample.deleted[delete.two,2]<-NA missing<-c(delete.one,delete.two) complete<-sample.deleted[!(is.na(sample.deleted[,1]) | is.na(sample.deleted[,2])),] deleted<-sample.deleted[missing,] Try to estimate the parameters of the data set less the deleted values: exact<-function(theta,complete,deleted){ one.only<-deleted[!(is.na(deleted[,1])),1] two.only<-deleted[!(is.na(deleted[,2])),2] u<-c(theta[1],theta[2]) sigma<-c(theta[3],theta[5],theta[5],theta[4]) dim(sigma)<-c(2,2) -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- sum(log(dnorm(one.only,u[1],sigma[1,1])))- sum(log(dnorm(two.only,u[2],sigma[2,2]))) } nlminb(start=c(0,0,1,1,0),objective=exact,complete=complete,deleted=deleted,control=list(trace=1)) Escape and it stops at Iteration 9 on my machine.
David Winsemius
2013-Oct-21 00:19 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
On Oct 20, 2013, at 3:01 PM, Steven LeBlanc wrote:> Greets, > > I'm trying to use nlminb() to estimate the parameters of a bivariate normal sample and during one of the iterations it passes a parameter vector to the likelihood function resulting in an invalid covariance matrix that causes dmvnorm() to throw an error. Thus, it seems I need to somehow communicate to nlminb() that the final three parameters in my parameter vector are used to construct a positive semi-definite matrix, but I can't see how to achieve this using the constraint mechanism provided. Additional details are provided in the code below. >I recently noticed that teh dmvnorn function in package mixtools doe not throw an error when given a non-positive definite varaince-covariance matrix. Whether it should be doing so might be up for debate. Peter Dalgaard seemed to think that any self respecting such function _should_ be throwing NaN's back at you. The code it was using is rather compact:> mixtools::dmvnormfunction (y, mu = NULL, sigma = NULL) { exp(logdmvnorm(y, mu = mu, sigma = sigma)) } <environment: namespace:mixtools>> mixtools::logdmvnormfunction (y, mu = NULL, sigma = NULL) { if (is.vector(y)) y <- matrix(y, nrow = 1) d <- ncol(y) if (!is.null(mu)) y <- sweep(y, 2, mu, "-") if (is.null(sigma)) sigma <- diag(d) k <- d * 1.83787706640935 a <- qr(sigma) logdet <- sum(log(abs(diag(a$qr)))) if (nrow(y) == 1) mahaldist <- as.vector(y %*% qr.solve(a, t(y))) else mahaldist <- rowSums((y %*% qr.solve(a)) * y) -0.5 * (mahaldist + logdet + k) } I noticed that you were logging the dmvnorm values and so I wondered also if going directly to that function might save some time. (Note that I am way out of my mathematical dept here. Caveat emptor, maxima caveat emptor). -- David.> Suggestions? > > Best Regards, > Steven > > Generate the data set I'm using: > > mu<-c(1,5) > sigma<-c(1,2,2,6) > dim(sigma)<-c(2,2) > set.seed(83165026) > sample.full<-sample.deleted<-mvrnorm(50,mu,sigma) > delete.one<-c(1,5,13,20) > delete.two<-c(3,11,17,31,40,41) > sample.deleted[delete.one,1]<-NA > sample.deleted[delete.two,2]<-NA > missing<-c(delete.one,delete.two) > complete<-sample.deleted[!(is.na(sample.deleted[,1]) | is.na(sample.deleted[,2])),] > deleted<-sample.deleted[missing,] > > Try to estimate the parameters of the data set less the deleted values: > > exact<-function(theta,complete,deleted){ > one.only<-deleted[!(is.na(deleted[,1])),1] > two.only<-deleted[!(is.na(deleted[,2])),2] > u<-c(theta[1],theta[2]) > sigma<-c(theta[3],theta[5],theta[5],theta[4]) > dim(sigma)<-c(2,2) > -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- > sum(log(dnorm(one.only,u[1],sigma[1,1])))- > sum(log(dnorm(two.only,u[2],sigma[2,2]))) > } > nlminb(start=c(0,0,1,1,0),objective=exact,complete=complete,deleted=deleted,control=list(trace=1)) > > Escape and it stops at Iteration 9 on my machine. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
William Dunlap
2013-Oct-21 01:41 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
Do you mean that your objective function (given to nlminb) parameterized a positive definite matrix, P, as the elements in its upper (or lower) triangle? If so, you could reparameterize it by the non-zero (upper triangular) elements of the Choleski decomposition, C, of P. Compute P as crossprod(C), compute the initial estimate of C as chol(P). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Steven LeBlanc > Sent: Sunday, October 20, 2013 3:02 PM > To: r-help at R-project.org > Subject: [R] nlminb() - how do I constrain the parameter vector properly? > > Greets, > > I'm trying to use nlminb() to estimate the parameters of a bivariate normal sample and > during one of the iterations it passes a parameter vector to the likelihood function > resulting in an invalid covariance matrix that causes dmvnorm() to throw an error. Thus, > it seems I need to somehow communicate to nlminb() that the final three parameters in > my parameter vector are used to construct a positive semi-definite matrix, but I can't see > how to achieve this using the constraint mechanism provided. Additional details are > provided in the code below. > > Suggestions? > > Best Regards, > Steven > > Generate the data set I'm using: > > mu<-c(1,5) > sigma<-c(1,2,2,6) > dim(sigma)<-c(2,2) > set.seed(83165026) > sample.full<-sample.deleted<-mvrnorm(50,mu,sigma) > delete.one<-c(1,5,13,20) > delete.two<-c(3,11,17,31,40,41) > sample.deleted[delete.one,1]<-NA > sample.deleted[delete.two,2]<-NA > missing<-c(delete.one,delete.two) > complete<-sample.deleted[!(is.na(sample.deleted[,1]) | is.na(sample.deleted[,2])),] > deleted<-sample.deleted[missing,] > > Try to estimate the parameters of the data set less the deleted values: > > exact<-function(theta,complete,deleted){ > one.only<-deleted[!(is.na(deleted[,1])),1] > two.only<-deleted[!(is.na(deleted[,2])),2] > u<-c(theta[1],theta[2]) > sigma<-c(theta[3],theta[5],theta[5],theta[4]) > dim(sigma)<-c(2,2) > -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- > sum(log(dnorm(one.only,u[1],sigma[1,1])))- > sum(log(dnorm(two.only,u[2],sigma[2,2]))) > } > nlminb(start=c(0,0,1,1,0),objective=exact,complete=complete,deleted=deleted,control=l > ist(trace=1)) > > Escape and it stops at Iteration 9 on my machine. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Prof J C Nash (U30A)
2013-Oct-21 12:47 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
This is one area where more internal communication between the objective function (inadmissible inputs) and optimizer (e.g., Quasi-Newton) is needed. This is NOT done at the moment in R, nor in most software. An area for R&D. In Nash and Walker-Smith (1987) we did some of this in BASIC back in mid-80s. Still trying to redo that for R, but it won't be done quickly due to the much bigger infrastructure of R. The "trick" with using a large value or Inf (which sometimes causes other errors) usually slows the optimization, whereas communicating that the objective is inadmissible in a line search can often be simply a shortening of the step size. JN On 13-10-21 06:00 AM, r-help-request at r-project.org wrote:> Message: 34 > Date: Mon, 21 Oct 2013 05:56:45 -0400 > From: Mark Leeds<markleeds2 at gmail.com> > To: Steven LeBlanc<oreslag at gmail.com> > Cc:"r-help at R-project.org" <r-help at r-project.org> > Subject: Re: [R] nlminb() - how do I constrain the parameter vector > properly? > Message-ID: > <CAHz+bWYEtvZjiCCauGVxstGcqmF6ENw0N3MMb7jfa3OkYkBTKA at mail.gmail.com> > Content-Type: text/plain > > my mistake. since nlminb is minimizing, it should be +Inf ( so that the > likelihood > is large ) as you pointed out. Note that this approach is a heuristic and > may not work all the time.
William Dunlap
2013-Oct-21 14:52 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
Try defining the function theta345toSigma <- function(theta) { cholSigma <- cbind(c(theta[3], 0), c(theta[4], theta[5])) crossprod(cholSigma) # t(cholSigma) %*% cholSigma) } This creates a positive definite matrix for any theta (and any positive definite matrix has such a representation, generally more than one). It is like using the square root of a quantity in the optimizer when you know the quantity must be non-negative. Then change your sigma <- c(theta[3],theta[5],theta[5],theta[4]) dim(sigma) <- c(2, 2) to sigma <- theta345toSigma(theta) If one of your variances is near 0 the optimizer may run into trouble at saddlepoints. Others may be able to help better with that issue. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: Steven LeBlanc [mailto:oreslag at gmail.com] > Sent: Sunday, October 20, 2013 9:35 PM > To: William Dunlap > Subject: Re: [R] nlminb() - how do I constrain the parameter vector properly? > > > On Oct 20, 2013, at 6:41 PM, William Dunlap <wdunlap at tibco.com> wrote: > > > Do you mean that your objective function (given to nlminb) parameterized > > a positive definite matrix, P, as the elements in its upper (or lower) triangle? > > If so, you could reparameterize it by the non-zero (upper triangular) elements > > of the Choleski decomposition, C, of P. Compute P as crossprod(C), compute > > the initial estimate of C as chol(P). > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > Hi Bill, > > I've clipped out the superfluous code to leave the objective function and call to nlminb() > below. > > I'm using the first two parameters to construct the vector of means 'u' for a bivariate > normal, and the final three parameters to construct the corresponding covariance matrix > 'sigma'. Both are required by dmvnorm(). In summary, nlminb() required a vector of > parameters, so I supplied the number of parameters I needed nlminb() to optimize and > simply built the required formats within the function. > > It didn't occur to me until I saw the error that nlminb() would have no way of knowing the > proper boundaries of the parameter space unless there is some way to communicate the > constraints. nlminb() implements simple box constraints, but I don't see a way to > communicate "parameters 3, 4, and 5 must satisfy 3*4 - 5^2 > 0. > > Regarding your suggestion, I don't think I understand. Might you elaborate? > > Thanks & Best Regards, > Steven > > > exact<-function(theta,complete,deleted){ > >> > >> u<-c(theta[1],theta[2]) > >> sigma<-c(theta[3],theta[5],theta[5],theta[4]) > >> dim(sigma)<-c(2,2) > >> -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- > >> sum(log(dnorm(one.only,u[1],sigma[1,1])))- > >> sum(log(dnorm(two.only,u[2],sigma[2,2]))) > >> } > >> > nlminb(start=c(0,0,1,1,0),objective=exact,complete=complete,deleted=deleted,control=l > >> ist(trace=1))
William Dunlap
2013-Oct-21 16:28 UTC
[R] nlminb() - how do I constrain the parameter vector properly?
[I added r-help to the cc again. Please keep the replies on the list as there are others who can contribute to or learn from them.]> I also learned you have to be very careful with the starting value, as the simple identity > matrix becomes singular under the transformation.That is why I suggested using the non-zero entries in chol(sigmaStart), where sigmaStart is your initial estimate of the variance matrix, as the initial values of theta[3:5]. I should have said that crossProd(x) gives you a positive semidefinite matrix, not positive definite. When I said that variances near 0 would cause problems I meant that a singular covariance matrix would cause problems. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: Steven LeBlanc [mailto:oreslag at gmail.com] > Sent: Monday, October 21, 2013 9:21 AM > To: William Dunlap > Subject: Re: [R] nlminb() - how do I constrain the parameter vector properly? > > > On Oct 21, 2013, at 7:52 AM, William Dunlap <wdunlap at tibco.com> wrote: > > > Try defining the function > > theta345toSigma <- function(theta) { > > cholSigma <- cbind(c(theta[3], 0), c(theta[4], theta[5])) > > crossprod(cholSigma) # t(cholSigma) %*% cholSigma) > > } > > This creates a positive definite matrix for any theta (and > > any positive definite matrix has such a representation, generally > > more than one). It is like using the square root of a quantity > > in the optimizer when you know the quantity must be non-negative. > > > > Then change your > > sigma <- c(theta[3],theta[5],theta[5],theta[4]) > > dim(sigma) <- c(2, 2) > > to > > sigma <- theta345toSigma(theta) > > > > If one of your variances is near 0 the optimizer may run into > > trouble at saddlepoints. Others may be able to help better > > with that issue. > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > Hi Bill, > > I tried your suggestion and the optimizer produces a result, but it seems substantially far > from the anticipated result and from the result obtained when I use Inf as a return value > for an invalid covariance matrix. Perhaps it would work if I made other adjustments to > account for the 'bias' (using this term very loosely) induced by changing an invalid > parameter vector into something valid but incorrect? > > I also learned you have to be very careful with the starting value, as the simple identity > matrix becomes singular under the transformation. > > > theta<-c(0,0,1,1,0) > > cholSigma<-cbind(c(theta[3], 0), c(theta[4], theta[5])) > > sigma<-crossprod(cholSigma) > > sigma > [,1] [,2] > [1,] 1 1 > [2,] 1 1 > > In any case, the Inf trick works for now. I was asking about 'adjustments' strictly out of > curiosity. Code and results are below. > > Best Regards, > Steven > > > exact<-function(theta,complete,deleted){ > + one.only<-deleted[!(is.na(deleted[,1])),1] > + two.only<-deleted[!(is.na(deleted[,2])),2] > + u<-c(theta[1],theta[2]) > + sigma<-c(theta[3],theta[5],theta[5],theta[4]) > + dim(sigma)<-c(2,2) > + if(!(is.positive.semi.definite(sigma))){return(Inf)} > + -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- > + sum(log(dnorm(one.only,u[1],sigma[1,1])))- > + sum(log(dnorm(two.only,u[2],sigma[2,2]))) > + } > > exact2<-function(theta,complete,deleted){ > + one.only<-deleted[!(is.na(deleted[,1])),1] > + two.only<-deleted[!(is.na(deleted[,2])),2] > + u<-c(theta[1],theta[2]) > + cholSigma<-cbind(c(theta[3], 0), c(theta[4], theta[5])) > + sigma<-crossprod(cholSigma) > + -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))- > + sum(log(dnorm(one.only,u[1],sigma[1,1])))- > + sum(log(dnorm(two.only,u[2],sigma[2,2]))) > + } > > nlminb(start=theta.hat.em,objective=exact,complete=complete,deleted=deleted)$par > [1] 1.2289422 5.4995271 0.9395155 4.8069068 1.8009213 > > nlminb(start=theta.hat.em,objective=exact2,complete=complete,deleted=deleted)$par > [1] 1.2289421 5.4995265 0.9692861 1.8579876 1.1639544