optimistix icon indicating copy to clipboard operation
optimistix copied to clipboard

Routine to identify the generalised Cauchy point

Open johannahaffner opened this issue 3 months ago • 5 comments

This adds a small module with a routine that identifies the generalised Cauchy point, the first local minimiser along the piecewise linear path defined by the gradient, projected on the box defined by bound constraints.

This is an essential ingredient in BFGS-B and L-BFGS-B, and implemented as described in the original publication: https://epubs.siam.org/doi/10.1137/0916069 As with many constrained methods, there are lots of edge cases to think of + get right here, so I added a bunch of tests to cycle through the ones I could think of.

I think that the cauchy_point module can be merged into main: I would not add it to the public API, and the change is purely additive.

johannahaffner avatar Nov 01 '25 23:11 johannahaffner

Tagging @cjchristopher - this is the routine to identify the generalised Cauchy point, required for BFGS-B and L-BFGS-B. I'm pretty confident that it does the right thing, but another set of eyes would be great!

You're right about the strong Wolfe condition in the line search proposed in the Byrd et al. paper. I haven't checked how they modify the line search to ensure that it stays in the feasible region, though. Right now just trying to divide and conquer the pile of code I already wrote to make constrained optimisation happen in Optimistix :)

johannahaffner avatar Nov 02 '25 00:11 johannahaffner

Tagging @cjchristopher - this is the routine to identify the generalised Cauchy point, required for BFGS-B and L-BFGS-B. I'm pretty confident that it does the right thing, but another set of eyes would be great!

You're right about the strong Wolfe condition in the line search proposed in the Byrd et al. paper. I haven't checked how they modify the line search to ensure that it stays in the feasible region, though. Right now just trying to divide and conquer the pile of code I already wrote to make constrained optimisation happen in Optimistix :)

I haven't had a chance to look at this yet - but I've had eyes on the jaxopt L-BFGS-B and Zoom again and I think I've identified the problem(s) (with an verifying assist from GPT-5 Codex):

  1. Here, in the linesearch branch of the l-bfgs-b update routine: https://github.com/google/jaxopt/blob/cf28b4563f5ad9354b76433622dbb9ee32af5f09/jaxopt/_src/lbfgsb.py#L469-L493 , there is no clipping of the parameters back in bound like there is in the else branch... this would be fine if the linesearch respected the bounds, but of course, the zoom linesearch is not implemented to respect the bounds provided to L-BFGS-B, and so I think this is where the problems occur.....because:
  2. I am not sure why the jaxopt team opted to use that implementation of Zoom for a bounded optimisation routine, actually - but looking again the linesearch suggested in the original L-BFGS-B (More & Thuente), that search does indeed take the bounds as parameters (https://github.com/antoinecollet5/lbfgsb/blob/master/lbfgsb/linesearch.py - note the ub and lb parameters) and uses them to ensure that feasibility is respected.

So as you suggest here (https://github.com/patrick-kidger/optimistix/pull/143#issuecomment-3453515489), the care that should be taken for, at least L-BFGS-B in particular, is to have a linesearch (even a zoom-y one!) that does respect the bounds. Edit: Relevantly, I can't see what the Zoom linesearch we have (and subsequently the one is optax and jaxopt) is doing that the More & Thuente algorithm doesn't do, except for respect additional bounds when provided.

cjchristopher avatar Nov 07 '25 08:11 cjchristopher

I haven't had a chance to look at this yet -

No pressure! Whenever you have the time.

but I've had eyes on the jaxopt L-BFGS-B and Zoom again and I think I've identified the problem(s)

Thank you! This is very helpful.

A missing clip in one branch of the line search implementation is exactly the type of bug that is very likely to occur in such an implementation. Since you write that our Zoom line search does all the things theirs does, except for ensuring that the bounds are respected, it sounds like we have fairly easy modifications to make on our end.

I think a principle I would implement across our code base is that constrained descents only ever return a feasible step to begin with, and that searches that may return step sizes greater than 1.0 optionally support truncation to feasible step sizes if bounds are present. The truncated step size can be obtained with tree_min(feasible_step_length), both functions that are already part of our misc module.


The other type of easily introduced bugs concerns the Cauchy point routine itself, which has a number of edge cases that need to be handled correctly, depending on where we are when invoking the routine (at the boundary or in the interior), and where the gradient points, or whether it is zero. For that reason, I made this a separate PR and added a bunch of test cases.

Bugs of this nature might also be present in their implementation, but there is no way of knowing without testing their routine separately.

johannahaffner avatar Nov 07 '25 13:11 johannahaffner

Ah to clarify, that missing clip is in the l-bfgs-b implementation, after zoom is called and returned. I assume an eventual l-bfgs-b implementing for optimistix will correctly enforce bounds both in l-bfgs-b, and whatever linesearch it happens to use :)

At some point I'll have a closer look at More & Thuente and see if the Zoom here can simply accept bounds optionally and enforce them without drastically changing the rest of the routine.

cjchristopher avatar Nov 07 '25 15:11 cjchristopher

Ah to clarify, that missing clip is in the l-bfgs-b implementation, after zoom is called and returned. I assume an eventual l-bfgs-b implementing for optimistix will correctly enforce bounds both in l-bfgs-b, and whatever linesearch it happens to use :)

Ah I see! In that case I would prefer to truncate the step length, clipping can alter the direction quite substantially depending on where one is with respect to the bounds. I don't see that playing well with solvers that iteratively build a Hessian approximation.

At some point I'll have a closer look at More & Thuente and see if the Zoom here can simply accept bounds optionally and enforce them without drastically changing the rest of the routine.

This may mean that for some steps the Wolfe condition does not hold, especially if we are close to the boundary. But I guess in that case that is simply a price to be paid.

With respect to the concrete implementation - in my current development branch for constrained solves, I write the bounds into the FunctionInfo object. This is one way to do it.

johannahaffner avatar Nov 07 '25 17:11 johannahaffner

I've added explanations to the test cases and added an extra one - this does the right thing + I will merge it.

johannahaffner avatar Nov 29 '25 16:11 johannahaffner