Gregg Powell
2025-May-04 20:27 UTC
[R] Estimating regression with constraints in model coefficients - Follow-up on Constrained Ordinal Model — Optimized via COBYLA
Hello Christofer, Just writing with a detailed follow-up. Attached is a script I was able to get running with a bit of work. I did not include the script in the ext of this email. It is only attached. Optimization Progress We were initially aiming to solve the dual-slope constrained ordinal model using nloptr's SLSQP algorithm (NLOPT_LD_SLSQP), since it supports: ? Box constraints (per-? lower/upper bounds) ? Equality constraints (sum(?) = 1.60 for each ? set) However, SLSQP requires explicit gradient definitions for both the objective function and constraint Jacobian. While the documentation suggests that eval_grad_f = NULL should trigger finite-difference approximation, this did not work in my initial script ? I received repeated errors: ?A gradient for the objective function is needed by algorithm NLOPT_LD_SLSQP but was not supplied.? Even when using wrapper functions with NA values for gradient placeholders, the optimizer failed to recognize them. After multiple diagnostic attempts and some research, I shifted to: NLOPT_LN_COBYLA ? Derivative-Free Optimization This algorithm supports: ? Equality constraints (sum of ?? and ?? = 1.60) ? Bound constraints (per-element ? limits) Also - this does not require gradients. This made it the best practical option. Results The optimizer successfully converged (status: NLOPT_XTOL_REACHED) after 202 iterations. Estimated coefficients: ? ?? (for P(Y ? 1)): c(1.0, -0.4, 1.0) ? ?? (for P(Y ? 2)): c(1.0, 0.6, 3.25e-19) Interpretation: ? Both ? vectors satisfy their respective sum constraint (? 1.60). ? All elements are within their specified bounds. ? The third coefficient for ?? (gpa) hits its lower bound (0), suggesting minimal influence on the second cumulative probability cutoff. ? The model does not estimate cutpoints/intercepts explicitly. Instead, the two separate slope vectors approximate this ordinal structure. This confirms your structure ? estimating distinct slopes per class boundary without intercepts ? is now functioning with full constraints enforced. ________________________________________ Also, we could take this a step further - Let me know if you'd be interested in: ? A function to plot fitted probabilities per class (based on the estimated ?? and ??), ? A summary write-up of this implementation stage (as a formal hand-off or documentation), ? A switch to alabama::augLag(), which allows both equality and bound constraints and uses augmented Lagrangian methods ? possibly offering improved precision with gradient-based control. Happy to assist with any of those, as time permits - perhaps next week. r/ Gregg On Thursday, May 1st, 2025 at 1:01 PM, Gregg Powell via R-help <r-help at r-project.org> wrote:>>> Hello Christofer, >> That?s good information. Thanks for the precision. > Think I can use this now to work on a script: >> Based on what you provided: > 1. Two distinct sets of coefficients, ?? and ??, each associated with the logits for: > ? P(Y?1)P(Y ? 1)P(Y?1) > ? P(Y?2)P(Y ? 2)P(Y?2) >> 2. Separate sum constraints: > ? ??(1)=1.60 \ sum ?^{(1)} = 1.60??(1)=1.60 > ? ??(2)=1.60 \ sum ?^{(2)} = 1.60??(2)=1.60 >> 3. Element-wise bounds applied to each ? set: > ? lower = c(1, -1, 0) > ? upper = c(2, 1, 1) >> Planning to wrap this into an updated nloptr optimization routine. I?ll work on a .R script as time permits ? if all goes well, you should be able to run it as provided. >> r/ > Gregg >>>>>> On Wednesday, April 30th, 2025 at 3:23 AM, Christofer Bogaso bogaso.christofer at gmail.com wrote: >> > Hi Gregg, >> > Below I try to address >> > 1) The sum constraint would apply for each set ?? and ?? i.e. sum(??) > > = sum(??) = 1.60 > > 2) Just like 1) the lower and upper bounds will be applied for > > individual set i.e. individual elements of ?? are subject to lower > > c(1, -1, 0) and upper = c(2, 1, 1) and individual elements of ?? are > > subject to lower = c(1, -1, 0) and upper = c(2, 1, 1) >> > I hope that I am able to answer your questions. Please let me know if > > further information is required. >> > Thanks and regards, >> > On Wed, Apr 30, 2025 at 4:22?AM Gregg Powell g.a.powell at protonmail.com wrote: >> > > Hello again Christofer, > > > This clarification changes the model structure somewhat significantly -it shifts us from a standard cumulative logit model with proportional odds to a non-parallel cumulative logit model, where each threshold has its own set of ? coefficients. At least, that is now my understanding. > > > So, instead of a single ? vector shared across all class boundaries, you're now specifying: > > > ? One set of coefficients ?(1)?^{(1)}?(1) for the logit of P(Y?1)P(Y ? 1)P(Y?1), > > > ? A second, distinct set ?(2)?^{(2)}?(2) for P(Y?2)P(Y ? 2)P(Y?2), > > > ? And no intercepts, meaning the threshold-specific slope vectors must carry all the signal. >> > > So, we can adjust the log-likelihood accordingly: > > > P(Y=1)=logistic(X?(1))P(Y=2)=logistic(X?(2))?logistic(X?(1))P(Y=3)=1?logistic(X?(2))P(Y = 1) = logistic(X?^{(1)}) P(Y = 2) = logistic(X?^{(2)}) - logistic(X?^{(1)}) P(Y = 3) = 1 - logistic(X?^{(2)}) P(Y=1)=logistic(X?(1))P(Y=2)=logistic(X?(2))?logistic(X?(1))P(Y=3)=1?logistic(X?(2)) >> > > Before I attempt a revised script, can you confirm: > > > 1. Should the sum constraint (e.g., sum(?) = 1.60) apply to: > > > ? Only ??? > > > ? Only ??? > > > ? Or the sum of all 6 coefficients (?? and ?? combined)? >> > > 2. Do you want to apply separate lower/upper bounds to each of the six ? coefficients (and if so, what are they for each)? >> > > Once I understand this last part better, I?ll see about working on a version that fits this updated structure and constraint logic. > > > As always ? no promises. > > > r/ > > > Gregg Powell > > > Sierra Vista, AZ >> > > On Tuesday, April 29th, 2025 at 1:51 PM, Christofer Bogaso bogaso.christofer at gmail.com wrote: >> > > > Hi Gregg, >> > > > I am just wondering if you get any time to think about this problem. >> > > > As I understand, typically for this type of problem, we have different > > > > intercepts for different classes, while slope (beta) coefficients > > > > remain the same across classes. >> > > > But in my case, since we do not have any intercept term, the slope > > > > coefficients need to be estimated separately for different classes. > > > > Therefore, since we have 3 classes in the response variable (i.e. > > > > 'apply'), there will be 3 different sets of beta-coefficients for the > > > > independent variables. >> > > > Under this situation, I wonder how I can create the likelihood > > > > function subject to all applicable constraints. >> > > > Any insight would be highly appreciated. >> > > > Thanks and regards, >> > > > On Fri, Apr 25, 2025 at 12:31?AM Gregg Powell g.a.powell at protonmail.com wrote: >> > > > > Christofer, > > > > > That was a detailed follow-up ? you clarified the requirements precisely enough providing a position to proceed from... >> > > > > To implement this, a constrained optimization procedure to estimate an ordinal logistic regression model is needed (cumulative logit), based on: >> > > > > 1. Estimated Cutpoints > > > > > ? No intercept in the linear predictor > > > > > ? Cutpoints (thresholds) will be estimated directly from the data >> > > > > 2. Coefficient Constraints > > > > > ? Box constraints on each coefficient > > > > > ? For now: > > > > > lower = c(1, -1, 0) > > > > > upper = c(2, 1, 1) > > > > > ? These apply to: pared, public, gpa (in that order) >> > > > > 3. Sum Constraint > > > > > ? The sum of coefficients must equal 1.60 >> > > > > 4. Data to use... > > > > > ? Use the IDRE ologit.dta dataset from UCLA (for now) >> > > > > Technical Approach >> > > > > ? Attempt to write a custom negative log-likelihood function using the cumulative logit formulation: >> > > > > P(Y?k?X)=11+exp?[?(?k?X?)]P(Y \leq k \mid X) = \frac{1}{1 + \exp[-(\theta_k - X\beta)]} >> > > > > and derive P(Y=k)P(Y = k) from adjacent differences of these. >> > > > > ? Cutpoints ?k\theta_k will be estimated as separate parameters, with constraints to ensure they?re strictly increasing for identifiability. >> > > > > ? The optimization will use nloptr::nloptr(), which supports: > > > > > - Lower/upper bounds (box constraints) > > > > > - Equality constraints (for sum of ?) > > > > > - Nonlinear inequality constraints (to keep cutpoints ordered) >> > > > > I?ll have some time later - in the next day or two to attempt a script with: > > > > > - Custom negative log-likelihood > > > > > - Parameter vector split into ? and cutpoints > > > > > - Constraint functions: sum(?) = 1.60 and increasing cutpoints > > > > > - Optimization via nloptr() >> > > > > Hopefully, you?ll be able to run it locally with only the VGAM, foreign, and nloptr packages. >> > > > > I?ll send the .R file along with the next email. A best attempt, anyway. >> > > > > r/ > > > > > Gregg >> > > > > ?Oh, what fun it is!? > > > > > ?Alice, Alice?s Adventures in Wonderland by Lewis Carroll >> > > > > On Thursday, April 24th, 2025 at 1:56 AM, Christofer Bogaso bogaso.christofer at gmail.com wrote: >> > > > > > Hi Gregg, >> > > > > > Many thanks for your detail feedback, those are really super helpful. >> > > > > > I have ordered a copy of Agresti from our local library, however it > > > > > > appears that I would need to wait for a few days. >> > > > > > Regrading my queries, it would be super helpful if we can build a > > > > > > optimization algo based on below criterias: >> > > > > > 1. Whether you want the cutpoints fixed (and to what values), or if > > > > > > you want them estimated separately (with identifiability managed some > > > > > > other way); I would like to have cut-points directly estimated from > > > > > > the data > > > > > > 2. What your bounds on the coefficients are (lower/upper vectors), For > > > > > > this question, I am having different bounds for each of the > > > > > > coefficients. Therefore I would have a vector of lower points and > > > > > > other vector of upper points. However to start with let consider lower > > > > > > and upper bounds as lower = c(1, -1, 0) and upper = c(2, 1, 1) for the > > > > > > variables pared, public, and gpa. In my model, there would not be any > > > > > > Intercept, hence no question on its bounds > > > > > > 3. What value the sum of coefficients should equal (e.g., sum(?) = 1, > > > > > > or something else); Let the sum be 1.60 > > > > > > 4. And whether you're working with the IDRE example data, or something > > > > > > else. My original data is actually residing in a computer without any > > > > > > access to the internet (for data security.) However we can start with > > > > > > any suitable data for this experiment, so one such data may be > > > > > > https://stats.idre.ucla.edu/stat/data/ologit.dta. However I am still > > > > > > exploring if there is any possibility to extract a randomized copy of > > > > > > that actual data, if I can - I will share once available >> > > > > > Again, many thanks for your time and insights. >> > > > > > Thanks and regards, >> > > > > > On Wed, Apr 23, 2025 at 9:54?PM Gregg Powell g.a.powell at protonmail.com wrote: >> > > > > > > Hello again Christofer, > > > > > > > Thanks for your thoughtful note ? I?m glad the outline was helpful. Apologies for the long delay getting back to you. Had to do a small bit of research? >> > > > > > > Recommended Text on Ordinal Log-Likelihoods: > > > > > > > You're right that most online sources jump straight to code or canned functions. For a solid theoretical treatment of ordinal models and their likelihoods, consider: > > > > > > > "Categorical Data Analysis" by Alan Agresti > > > > > > > ? Especially Chapters 7 and 8 on ordinal logistic regression. > > > > > > > ? Covers proportional odds models, cumulative logits, adjacent-category logits, and the derivation of likelihood functions. > > > > > > > ? Provides not only equations but also intuition behind the model structure. > > > > > > > It?s a standard reference in the field and explains how to build log-likelihoods from first principles ? including how the cumulative probabilities arise and why the difference of CDFs represents a category-specific probability. > > > > > > > Also helpful: > > > > > > > "An Introduction to Categorical Data Analysis" by Agresti (2nd ed) ? A bit more accessible, and still covers the basics of ordinal models. > > > > > > > ________________________________________ >> > > > > > > If You Want to Proceed Practically: > > > > > > > In parallel with theory, if you'd like a working R example coded from scratch ? with: > > > > > > > ? a custom likelihood for an ordinal (cumulative logit) model, > > > > > > > ? fixed thresholds / no intercept, > > > > > > > ? coefficient bounds, > > > > > > > ? and a sum constraint on ? >> > > > > > > I?d be happy to attempt that using nloptr() or constrOptim(). You?d be able to walk through it and tweak it as necessary (no guarantee that I will get it right ?) >> > > > > > > Just let me know: > > > > > > > 1. Whether you want the cutpoints fixed (and to what values), or if you want them estimated separately (with identifiability managed some other way); > > > > > > > 2. What your bounds on the coefficients are (lower/upper vectors), > > > > > > > 3. What value the sum of coefficients should equal (e.g., sum(?) = 1, or something else); > > > > > > > 4. And whether you're working with the IDRE example data, or something else. >> > > > > > > I can use the UCLA ologit.dta dataset as a basis if that's easiest to demo on, or if you have another dataset you?d prefer ? again, let me know. >> > > > > > > All the best! >> > > > > > > gregg >> > > > > > > On Monday, April 21st, 2025 at 11:25 AM, Christofer Bogaso bogaso.christofer at gmail.com wrote: >> > > > > > > > Hi Gregg, >> > > > > > > > I am sincerely thankful for this workout. >> > > > > > > > Could you please suggest any text book on how to create log-likelihood > > > > > > > > for an ordinal model like this? Most of my online search point me > > > > > > > > directly to some R function etc, but a theoretical discussion on this > > > > > > > > subject would be really helpful to construct the same. >> > > > > > > > Thanks and regards, >> > > > > > > > On Mon, Apr 21, 2025 at 9:55?PM Gregg Powell g.a.powell at protonmail.com wrote: >> > > > > > > > > Christofer, >> > > > > > > > > Given the constraints you mentioned?bounded parameters, no intercept, and a sum constraint?you're outside the capabilities of most off-the-shelf ordinal logistic regression functions in R like vglm or polr. >> > > > > > > > > The most flexible recommendation at this point is to implement custom likelihood optimization using constrOptim() or nloptr::nloptr() with a manually coded log-likelihood function for the cumulative logit model. >> > > > > > > > > Since you need: >> > > > > > > > > Coefficient bounds (e.g., lb ? ? ? ub), >> > > > > > > > > No intercept, >> > > > > > > > > And a constraint like sum(?) = C, >> > > > > > > > > ?you'll need to code your own objective function. Here's something of a high-level outline of the approach: >> > > > > > > > > A. Model Setup > > > > > > > > > Let your design matrix X be n x p, and let Y take ordered values 1, 2, ..., K. >> > > > > > > > > B. Parameters > > > > > > > > > Assume the thresholds (?_k) are fixed (or removed entirely), and you?re estimating only the coefficient vector ?. Your constraints are: >> > > > > > > > > Box constraints: lb ? ? ? ub >> > > > > > > > > Equality constraint: sum(?) = C >> > > > > > > > > C. Likelihood > > > > > > > > > The probability of category k is given by: >> > > > > > > > > P(Y = k) = logistic(?_k - X?) - logistic(?_{k-1} - X?) >> > > > > > > > > Without intercepts, this becomes more like: >> > > > > > > > > P(Y ? k) = 1 / (1 + exp(-(c_k - X?))) >> > > > > > > > > ?where c_k are fixed thresholds. >> > > > > > > > > Implementation using nloptr > > > > > > > > > This example shows a rough sketch using nloptr, which allows both equality and bound constraints: >> > > > > > > > > > library(nloptr) >> > > > > > > > > > # Custom negative log-likelihood function > > > > > > > > > > negLL <- function(beta, X, y, K, cutpoints) { > > > > > > > > > > eta <- X %*% beta > > > > > > > > > > loglik <- 0 > > > > > > > > > > for (k in 1:K) { > > > > > > > > > > pk <- plogis(cutpoints[k] - eta) - plogis(cutpoints[k - 1] - eta) > > > > > > > > > > loglik <- loglik + sum(log(pk[y == k])) > > > > > > > > > > } > > > > > > > > > > return(-loglik) > > > > > > > > > > } >> > > > > > > > > > # Constraint: sum(beta) = C > > > > > > > > > > sum_constraint <- function(beta, C) { > > > > > > > > > > return(sum(beta) - C) > > > > > > > > > > } >> > > > > > > > > > # Define objective and constraint wrapper > > > > > > > > > > objective <- function(beta) negLL(beta, X, y, K, cutpoints) > > > > > > > > > > eq_constraint <- function(beta) sum_constraint(beta, C = 2) # example >C >> > > > > > > > > > # Run optimization > > > > > > > > > > res <- nloptr( > > > > > > > > > > x0 = rep(0, ncol(X)), > > > > > > > > > > eval_f = objective, > > > > > > > > > > lb = lower_bounds, > > > > > > > > > > ub = upper_bounds, > > > > > > > > > > eval_g_eq = eq_constraint, > > > > > > > > > > opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-8) > > > > > > > > > > ) >> > > > > > > > > The next step would be writing the actual log-likelihood for your specific problem or verifying constraint implementation. >> > > > > > > > > Manually coding the log-likelihood for an ordinal model is nontrivial... so a bit of a challenge :) >> > > > > > > > > All the best, > > > > > > > > > gregg powell > > > > > > > > > Sierra Vista, AZ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 603 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20250504/908ae87c/attachment.sig>