thr3ads.net - R help - [R] Estimating regression with constraints in model coefficients - Follow-up on Constrained Ordinal Model

If this information is useful, please help other people find it:
Share via:
Gregg Powell
2025-May-04 20:27 UTC
[R] Estimating regression with constraints in model coefficients - Follow-up on Constrained Ordinal Model — Optimized via COBYLA

Hello Christofer,

Just writing with a detailed follow-up. Attached is a script I was able to get
running with a bit of work. I did not include the script in the ext of this
email. It is only attached.

Optimization Progress
We were initially aiming to solve the dual-slope constrained ordinal model using
nloptr's SLSQP algorithm (NLOPT_LD_SLSQP), since it supports:
? Box constraints (per-? lower/upper bounds)
? Equality constraints (sum(?) = 1.60 for each ? set)

However, SLSQP requires explicit gradient definitions for both the objective
function and constraint Jacobian. While the documentation suggests that
eval_grad_f = NULL should trigger finite-difference approximation, this did not
work in my initial script ? I received repeated errors:
?A gradient for the objective function is needed by algorithm NLOPT_LD_SLSQP but
was not supplied.?
Even when using wrapper functions with NA values for gradient placeholders, the
optimizer failed to recognize them.


After multiple diagnostic attempts and some research, I shifted to:
NLOPT_LN_COBYLA ? Derivative-Free Optimization
This algorithm supports:
? Equality constraints (sum of ?? and ?? = 1.60)
? Bound constraints (per-element ? limits)

Also - this does not require gradients. This made it the best practical option.

Results
The optimizer successfully converged (status: NLOPT_XTOL_REACHED) after 202
iterations.

Estimated coefficients:
? ?? (for P(Y ? 1)): c(1.0, -0.4, 1.0)
? ?? (for P(Y ? 2)): c(1.0, 0.6, 3.25e-19)

Interpretation:
? Both ? vectors satisfy their respective sum constraint (? 1.60).
? All elements are within their specified bounds.
? The third coefficient for ?? (gpa) hits its lower bound (0), suggesting
minimal influence on the second cumulative probability cutoff.
? The model does not estimate cutpoints/intercepts explicitly. Instead, the two
separate slope vectors approximate this ordinal structure.
This confirms your structure ? estimating distinct slopes per class boundary
without intercepts ? is now functioning with full constraints enforced.

________________________________________
Also, we could take this a step further - Let me know if you'd be interested
in:
? A function to plot fitted probabilities per class (based on the estimated ??
and ??),
? A summary write-up of this implementation stage (as a formal hand-off or
documentation),
? A switch to alabama::augLag(), which allows both equality and bound
constraints and uses augmented Lagrangian methods ? possibly offering improved
precision with gradient-based control.

Happy to assist with any of those, as time permits - perhaps next week.

r/
Gregg











On Thursday, May 1st, 2025 at 1:01 PM, Gregg Powell via R-help <r-help at
r-project.org> wrote:
> 
> 
> Hello Christofer,
> 
> That?s good information. Thanks for the precision.
> Think I can use this now to work on a script:
> 
> Based on what you provided:
> 1. Two distinct sets of coefficients, ?? and ??, each associated with the
logits for:
> ? P(Y?1)P(Y ? 1)P(Y?1)
> ? P(Y?2)P(Y ? 2)P(Y?2)
> 
> 2. Separate sum constraints:
> ? ??(1)=1.60 \ sum ?^{(1)} = 1.60??(1)=1.60
> ? ??(2)=1.60 \ sum ?^{(2)} = 1.60??(2)=1.60
> 
> 3. Element-wise bounds applied to each ? set:
> ? lower = c(1, -1, 0)
> ? upper = c(2, 1, 1)
> 
> Planning to wrap this into an updated nloptr optimization routine. I?ll
work on a .R script as time permits ? if all goes well, you should be able to
run it as provided.
> 
> r/
> Gregg
> 
> 
> 
> 
> 
> On Wednesday, April 30th, 2025 at 3:23 AM, Christofer Bogaso
bogaso.christofer at gmail.com wrote:
> 
> > Hi Gregg,
> 
> > Below I try to address
> 
> > 1) The sum constraint would apply for each set ?? and ?? i.e. sum(??)
> > = sum(??) = 1.60
> > 2) Just like 1) the lower and upper bounds will be applied for
> > individual set i.e. individual elements of ?? are subject to lower
> > c(1, -1, 0) and upper = c(2, 1, 1) and individual elements of ?? are
> > subject to lower = c(1, -1, 0) and upper = c(2, 1, 1)
> 
> > I hope that I am able to answer your questions. Please let me know if
> > further information is required.
> 
> > Thanks and regards,
> 
> > On Wed, Apr 30, 2025 at 4:22?AM Gregg Powell g.a.powell at
protonmail.com wrote:
> 
> > > Hello again Christofer,
> > > This clarification changes the model structure somewhat
significantly -it shifts us from a standard cumulative logit model with
proportional odds to a non-parallel cumulative logit model, where each threshold
has its own set of ? coefficients. At least, that is now my understanding.
> > > So, instead of a single ? vector shared across all class
boundaries, you're now specifying:
> > > ? One set of coefficients ?(1)?^{(1)}?(1) for the logit of
P(Y?1)P(Y ? 1)P(Y?1),
> > > ? A second, distinct set ?(2)?^{(2)}?(2) for P(Y?2)P(Y ?
2)P(Y?2),
> > > ? And no intercepts, meaning the threshold-specific slope vectors
must carry all the signal.
> 
> > > So, we can adjust the log-likelihood accordingly:
> > >
P(Y=1)=logistic(X?(1))P(Y=2)=logistic(X?(2))?logistic(X?(1))P(Y=3)=1?logistic(X?(2))P(Y
= 1) = logistic(X?^{(1)}) P(Y = 2) = logistic(X?^{(2)}) - logistic(X?^{(1)}) P(Y
= 3) = 1 - logistic(X?^{(2)})
P(Y=1)=logistic(X?(1))P(Y=2)=logistic(X?(2))?logistic(X?(1))P(Y=3)=1?logistic(X?(2))
> 
> > > Before I attempt a revised script, can you confirm:
> > > 1. Should the sum constraint (e.g., sum(?) = 1.60) apply to:
> > > ? Only ???
> > > ? Only ???
> > > ? Or the sum of all 6 coefficients (?? and ?? combined)?
> 
> > > 2. Do you want to apply separate lower/upper bounds to each of
the six ? coefficients (and if so, what are they for each)?
> 
> > > Once I understand this last part better, I?ll see about working
on a version that fits this updated structure and constraint logic.
> > > As always ? no promises.
> > > r/
> > > Gregg Powell
> > > Sierra Vista, AZ
> 
> > > On Tuesday, April 29th, 2025 at 1:51 PM, Christofer Bogaso
bogaso.christofer at gmail.com wrote:
> 
> > > > Hi Gregg,
> 
> > > > I am just wondering if you get any time to think about this
problem.
> 
> > > > As I understand, typically for this type of problem, we have
different
> > > > intercepts for different classes, while slope (beta)
coefficients
> > > > remain the same across classes.
> 
> > > > But in my case, since we do not have any intercept term, the
slope
> > > > coefficients need to be estimated separately for different
classes.
> > > > Therefore, since we have 3 classes in the response variable
(i.e.
> > > > 'apply'), there will be 3 different sets of
beta-coefficients for the
> > > > independent variables.
> 
> > > > Under this situation, I wonder how I can create the
likelihood
> > > > function subject to all applicable constraints.
> 
> > > > Any insight would be highly appreciated.
> 
> > > > Thanks and regards,
> 
> > > > On Fri, Apr 25, 2025 at 12:31?AM Gregg Powell g.a.powell at
protonmail.com wrote:
> 
> > > > > Christofer,
> > > > > That was a detailed follow-up ? you clarified the
requirements precisely enough providing a position to proceed from...
> 
> > > > > To implement this, a constrained optimization procedure
to estimate an ordinal logistic regression model is needed (cumulative logit),
based on:
> 
> > > > > 1. Estimated Cutpoints
> > > > > ? No intercept in the linear predictor
> > > > > ? Cutpoints (thresholds) will be estimated directly
from the data
> 
> > > > > 2. Coefficient Constraints
> > > > > ? Box constraints on each coefficient
> > > > > ? For now:
> > > > > lower = c(1, -1, 0)
> > > > > upper = c(2, 1, 1)
> > > > > ? These apply to: pared, public, gpa (in that order)
> 
> > > > > 3. Sum Constraint
> > > > > ? The sum of coefficients must equal 1.60
> 
> > > > > 4. Data to use...
> > > > > ? Use the IDRE ologit.dta dataset from UCLA (for now)
> 
> > > > > Technical Approach
> 
> > > > > ? Attempt to write a custom negative log-likelihood
function using the cumulative logit formulation:
> 
> > > > > P(Y?k?X)=11+exp?[?(?k?X?)]P(Y \leq k \mid X) =
\frac{1}{1 + \exp[-(\theta_k - X\beta)]}
> 
> > > > > and derive P(Y=k)P(Y = k) from adjacent differences of
these.
> 
> > > > > ? Cutpoints ?k\theta_k will be estimated as separate
parameters, with constraints to ensure they?re strictly increasing for
identifiability.
> 
> > > > > ? The optimization will use nloptr::nloptr(), which
supports:
> > > > > - Lower/upper bounds (box constraints)
> > > > > - Equality constraints (for sum of ?)
> > > > > - Nonlinear inequality constraints (to keep cutpoints
ordered)
> 
> > > > > I?ll have some time later - in the next day or two to
attempt a script with:
> > > > > - Custom negative log-likelihood
> > > > > - Parameter vector split into ? and cutpoints
> > > > > - Constraint functions: sum(?) = 1.60 and increasing
cutpoints
> > > > > - Optimization via nloptr()
> 
> > > > > Hopefully, you?ll be able to run it locally with only
the VGAM, foreign, and nloptr packages.
> 
> > > > > I?ll send the .R file along with the next email. A best
attempt, anyway.
> 
> > > > > r/
> > > > > Gregg
> 
> > > > > ?Oh, what fun it is!?
> > > > > ?Alice, Alice?s Adventures in Wonderland by Lewis
Carroll
> 
> > > > > On Thursday, April 24th, 2025 at 1:56 AM, Christofer
Bogaso bogaso.christofer at gmail.com wrote:
> 
> > > > > > Hi Gregg,
> 
> > > > > > Many thanks for your detail feedback, those are
really super helpful.
> 
> > > > > > I have ordered a copy of Agresti from our local
library, however it
> > > > > > appears that I would need to wait for a few days.
> 
> > > > > > Regrading my queries, it would be super helpful if
we can build a
> > > > > > optimization algo based on below criterias:
> 
> > > > > > 1. Whether you want the cutpoints fixed (and to
what values), or if
> > > > > > you want them estimated separately (with
identifiability managed some
> > > > > > other way); I would like to have cut-points
directly estimated from
> > > > > > the data
> > > > > > 2. What your bounds on the coefficients are
(lower/upper vectors), For
> > > > > > this question, I am having different bounds for
each of the
> > > > > > coefficients. Therefore I would have a vector of
lower points and
> > > > > > other vector of upper points. However to start
with let consider lower
> > > > > > and upper bounds as lower = c(1, -1, 0) and upper
= c(2, 1, 1) for the
> > > > > > variables pared, public, and gpa. In my model,
there would not be any
> > > > > > Intercept, hence no question on its bounds
> > > > > > 3. What value the sum of coefficients should equal
(e.g., sum(?) = 1,
> > > > > > or something else); Let the sum be 1.60
> > > > > > 4. And whether you're working with the IDRE
example data, or something
> > > > > > else. My original data is actually residing in a
computer without any
> > > > > > access to the internet (for data security.)
However we can start with
> > > > > > any suitable data for this experiment, so one such
data may be
> > > > > > https://stats.idre.ucla.edu/stat/data/ologit.dta.
However I am still
> > > > > > exploring if there is any possibility to extract a
randomized copy of
> > > > > > that actual data, if I can - I will share once
available
> 
> > > > > > Again, many thanks for your time and insights.
> 
> > > > > > Thanks and regards,
> 
> > > > > > On Wed, Apr 23, 2025 at 9:54?PM Gregg Powell
g.a.powell at protonmail.com wrote:
> 
> > > > > > > Hello again Christofer,
> > > > > > > Thanks for your thoughtful note ? I?m glad
the outline was helpful. Apologies for the long delay getting back to you. Had
to do a small bit of research?
> 
> > > > > > > Recommended Text on Ordinal Log-Likelihoods:
> > > > > > > You're right that most online sources
jump straight to code or canned functions. For a solid theoretical treatment of
ordinal models and their likelihoods, consider:
> > > > > > > "Categorical Data Analysis" by Alan
Agresti
> > > > > > > ? Especially Chapters 7 and 8 on ordinal
logistic regression.
> > > > > > > ? Covers proportional odds models, cumulative
logits, adjacent-category logits, and the derivation of likelihood functions.
> > > > > > > ? Provides not only equations but also
intuition behind the model structure.
> > > > > > > It?s a standard reference in the field and
explains how to build log-likelihoods from first principles ? including how the
cumulative probabilities arise and why the difference of CDFs represents a
category-specific probability.
> > > > > > > Also helpful:
> > > > > > > "An Introduction to Categorical Data
Analysis" by Agresti (2nd ed) ? A bit more accessible, and still covers the
basics of ordinal models.
> > > > > > > ________________________________________
> 
> > > > > > > If You Want to Proceed Practically:
> > > > > > > In parallel with theory, if you'd like a
working R example coded from scratch ? with:
> > > > > > > ? a custom likelihood for an ordinal
(cumulative logit) model,
> > > > > > > ? fixed thresholds / no intercept,
> > > > > > > ? coefficient bounds,
> > > > > > > ? and a sum constraint on ?
> 
> > > > > > > I?d be happy to attempt that using nloptr()
or constrOptim(). You?d be able to walk through it and tweak it as necessary (no
guarantee that I will get it right ?)
> 
> > > > > > > Just let me know:
> > > > > > > 1. Whether you want the cutpoints fixed (and
to what values), or if you want them estimated separately (with identifiability
managed some other way);
> > > > > > > 2. What your bounds on the coefficients are
(lower/upper vectors),
> > > > > > > 3. What value the sum of coefficients should
equal (e.g., sum(?) = 1, or something else);
> > > > > > > 4. And whether you're working with the
IDRE example data, or something else.
> 
> > > > > > > I can use the UCLA ologit.dta dataset as a
basis if that's easiest to demo on, or if you have another dataset you?d
prefer ? again, let me know.
> 
> > > > > > > All the best!
> 
> > > > > > > gregg
> 
> > > > > > > On Monday, April 21st, 2025 at 11:25 AM,
Christofer Bogaso bogaso.christofer at gmail.com wrote:
> 
> > > > > > > > Hi Gregg,
> 
> > > > > > > > I am sincerely thankful for this
workout.
> 
> > > > > > > > Could you please suggest any text book
on how to create log-likelihood
> > > > > > > > for an ordinal model like this? Most of
my online search point me
> > > > > > > > directly to some R function etc, but a
theoretical discussion on this
> > > > > > > > subject would be really helpful to
construct the same.
> 
> > > > > > > > Thanks and regards,
> 
> > > > > > > > On Mon, Apr 21, 2025 at 9:55?PM Gregg
Powell g.a.powell at protonmail.com wrote:
> 
> > > > > > > > > Christofer,
> 
> > > > > > > > > Given the constraints you
mentioned?bounded parameters, no intercept, and a sum constraint?you're
outside the capabilities of most off-the-shelf ordinal logistic regression
functions in R like vglm or polr.
> 
> > > > > > > > > The most flexible recommendation at
this point is to implement custom likelihood optimization using constrOptim() or
nloptr::nloptr() with a manually coded log-likelihood function for the
cumulative logit model.
> 
> > > > > > > > > Since you need:
> 
> > > > > > > > > Coefficient bounds (e.g., lb ? ? ?
ub),
> 
> > > > > > > > > No intercept,
> 
> > > > > > > > > And a constraint like sum(?) = C,
> 
> > > > > > > > > ?you'll need to code your own
objective function. Here's something of a high-level outline of the
approach:
> 
> > > > > > > > > A. Model Setup
> > > > > > > > > Let your design matrix X be n x p,
and let Y take ordered values 1, 2, ..., K.
> 
> > > > > > > > > B. Parameters
> > > > > > > > > Assume the thresholds (?_k) are
fixed (or removed entirely), and you?re estimating only the coefficient vector
?. Your constraints are:
> 
> > > > > > > > > Box constraints: lb ? ? ? ub
> 
> > > > > > > > > Equality constraint: sum(?) = C
> 
> > > > > > > > > C. Likelihood
> > > > > > > > > The probability of category k is
given by:
> 
> > > > > > > > > P(Y = k) = logistic(?_k - X?) -
logistic(?_{k-1} - X?)
> 
> > > > > > > > > Without intercepts, this becomes
more like:
> 
> > > > > > > > > P(Y ? k) = 1 / (1 + exp(-(c_k -
X?)))
> 
> > > > > > > > > ?where c_k are fixed thresholds.
> 
> > > > > > > > > Implementation using nloptr
> > > > > > > > > This example shows a rough sketch
using nloptr, which allows both equality and bound constraints:
> 
> > > > > > > > > > library(nloptr)
> 
> > > > > > > > > > # Custom negative
log-likelihood function
> > > > > > > > > > negLL <- function(beta, X,
y, K, cutpoints) {
> > > > > > > > > > eta <- X %*% beta
> > > > > > > > > > loglik <- 0
> > > > > > > > > > for (k in 1:K) {
> > > > > > > > > > pk <- plogis(cutpoints[k] -
eta) - plogis(cutpoints[k - 1] - eta)
> > > > > > > > > > loglik <- loglik +
sum(log(pk[y == k]))
> > > > > > > > > > }
> > > > > > > > > > return(-loglik)
> > > > > > > > > > }
> 
> > > > > > > > > > # Constraint: sum(beta) = C
> > > > > > > > > > sum_constraint <-
function(beta, C) {
> > > > > > > > > > return(sum(beta) - C)
> > > > > > > > > > }
> 
> > > > > > > > > > # Define objective and
constraint wrapper
> > > > > > > > > > objective <- function(beta)
negLL(beta, X, y, K, cutpoints)
> > > > > > > > > > eq_constraint <-
function(beta) sum_constraint(beta, C = 2) # example >C
> 
> > > > > > > > > > # Run optimization
> > > > > > > > > > res <- nloptr(
> > > > > > > > > > x0 = rep(0, ncol(X)),
> > > > > > > > > > eval_f = objective,
> > > > > > > > > > lb = lower_bounds,
> > > > > > > > > > ub = upper_bounds,
> > > > > > > > > > eval_g_eq = eq_constraint,
> > > > > > > > > > opts = list(algorithm =
"NLOPT_LD_SLSQP", xtol_rel = 1e-8)
> > > > > > > > > > )
> 
> > > > > > > > > The next step would be writing the
actual log-likelihood for your specific problem or verifying constraint
implementation.
> 
> > > > > > > > > Manually coding the log-likelihood
for an ordinal model is nontrivial... so a bit of a challenge :)
> 
> > > > > > > > > All the best,
> > > > > > > > > gregg powell
> > > > > > > > > Sierra Vista,
AZ______________________________________________
> 
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 603 bytes
Desc: OpenPGP digital signature
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20250504/908ae87c/attachment.sig>
Possibly Parallel Threads

Search for more possibly parallel threads
R help - May 2025 - Estimating regression with constraints in model coefficients - Follow-up on Constrained Ordinal Model — Optimized via COBYLA

[R] Estimating regression with constraints in model coefficients - Follow-up on Constrained Ordinal Model — Optimized via COBYLA

Possibly Parallel Threads