Lucas Merrill Brown
2011-Oct-22 23:02 UTC
[R] Using optim with parameters that are factors (instead of continuous parameters)
I've been programming maximum likelihood estimation models using the function "optim." My current research requires modeling a particular parameter as a categorical variable (what R calls a "factor"), not as a continuous parameter. (The research question is, at what level of X does a subject in our experiment choose Y=1 instead of Y=0? So this is a "light switch" problem -- the subjects only switch from Y=0 to Y=1 after a particular threshold. And X only comes as a categorical variable, with integer values of 0,1,2,3,4, or 5.) So whenever optim tries to find the proper parameter for the threshold of X, it tries different threshold values such as 4.5, 4.7, 4.9 -- none of which make any difference because that wouldn't actually change the realizations of whether the threshold has been crossed. And then it says the element of the Hessian matrix for that parameter is zero, because changing it doesn't seem to affect the log-likelihood. Is there a way to tell optim that I'd like it to choose between only a limited number of factor values for the parameter? I've spent a lot of time on Google and in ?optim searching for the answer, but haven't made progress so far. Thank you so much for your help. Apologies for any confusing statements remaining in this message -- for me at least, it's been a difficult problem to describe succinctly. [[alternative HTML version deleted]]
Ben Bolker
2011-Oct-23 00:03 UTC
[R] Using optim with parameters that are factors (instead of continuous parameters)
Lucas Merrill Brown <lucas.merrill.brown <at> gmail.com> writes:> > I've been programming maximum likelihood estimation models using the > function "optim." My current research requires modeling a particular > parameter as a categorical variable (what R calls a "factor"), not as a > continuous parameter. > > (The research question is, at what level of X does a subject in our > experiment choose Y=1 instead of Y=0? So this is a "light switch" problem -- > the subjects only switch from Y=0 to Y=1 after a particular threshold. And X > only comes as a categorical variable, with integer values of 0,1,2,3,4, or > 5.) > > So whenever optim tries to find the proper parameter for the threshold of X, > it tries different threshold values such as 4.5, 4.7, 4.9 -- none of which > make any difference because that wouldn't actually change the realizations > of whether the threshold has been crossed. And then it says the element of > the Hessian matrix for that parameter is zero, because changing it doesn't > seem to affect the log-likelihood. > > Is there a way to tell optim that I'd like it to choose between only a > limited number of factor values for the parameter? > > I've spent a lot of time on Google and in ?optim searching for the answer, > but haven't made progress so far. Thank you so much for your help. Apologies > for any confusing statements remaining in this message -- for me at least, > it's been a difficult problem to describe succinctly.optim() is not really set up for discrete programming. You have a few options: * use method="SANN" (simulated annealing); you can specify a rule for choosing a new candidate solution. * make the likelihood surface slightly continuous -- i.e. a steep logistic function that is "almost" stepwise * probably most easily (if you only have a single discrete parameter) is compute a profile likelihood along that parameter -- i.e. solve the optimization problem for each value from 0 to 5, and compare the results ... See pp. 25-27 of http://www.math.mcmaster.ca/~bolker/emdbook/chap7A.pdf More generally see http://cran.r-project.org/web/views/Optimization.html , but I think the profile likelihood is going to work best for you ...
Lucas Merrill Brown
2011-Oct-27 15:50 UTC
[R] Using optim with parameters that are factors (instead of continuous parameters)
Ben, Thank you for the incredibly helpful suggestions and links. I've been exploring each over the past few days, and for anyone else's future reference, here's what I've found. (1) I was able to use SANN to specify how to choose new candidate solutions, but I wasn't able to easily use SANN for a model that includes both discrete and continuous parameters. That would require designating two separate rules for choosing new candidate solutions -- one rule for the continuous parameters and one rule for the discrete parameters. (2) Your second suggestion ended up solving the problem best for the needs of this data. I wrote a continuous function that looks a lot like a discrete pulse, and optim was able to find its way towards the specification with the maximum likelihood. A function of the general form f(x) = 1/(k + (c - x)^n) does the trick, where c represents the location of the discrete jump. I then optimized over potential values of c. (3) Generating log-likelihoods for each separate value of the parameter works well, especially for a parameter with few potential values. Since I'm also running a specification with individual-specific thresholds, however, re-running the regression five times for each individual is a little unwieldy. So it made the most sense to use solution #2. Thanks again for your prompt and productive response! Lucas
Possibly Parallel Threads
- Constrined dependent optimization.
- numerical differentiation in R? (for optim "SANN" parscale)
- Advice wanted on using optim with both continuous and discrete par arguments...
- Place constrictions on parameters when using Optim and MaxLik
- optim with simulated annealing SANN for combinatorial optimization