2 — Adding Trust Regions

The problem

Newton's method gives a great step when the quadratic model is accurate, but says nothing about how far from $x_k$ that accuracy holds.

The idea

Introduce a trust-region radius $\Delta_k$. At each iteration, solve

\[\min_{p}\; m_k(p) = f_k + \nabla f_k^\top p + \tfrac12 p^\top H_k\, p \quad\text{subject to}\quad \|p\| \le \Delta_k\]

The constraint $\|p\| \le \Delta_k$ prevents the solver from jumping into regions where the model is garbage.

Step acceptance

After computing the step $p_k$, we check how well the model predicted the actual change:

\[\rho_k = \frac{f(x_k) - f(x_k + p_k)}{m_k(0) - m_k(p_k)} = \frac{\text{actual reduction}}{\text{predicted reduction}}\]

$\rho_k$	Meaning	Action
$< \mu$ (e.g. 0.25)	Model is bad	Reject step, shrink $\Delta$
$\in [\mu, \eta)$	Model is OK	Accept step, keep $\Delta$
$\ge \eta$ (e.g. 0.75) and step hit boundary	Model is great	Accept step, expand $\Delta$

This produces an adaptive step size without a line search.

        Δ shrinks              Δ grows
       ◁━━━━━━━━┃━━━━━━━━━━━━━┃━━━━━━━━━▷
   ρ = 0       μ = 0.25      η = 0.75      ρ = 1

\[\Delta_{k+1} = \begin{cases} \gamma_1 \,\Delta_k & \text{if } \rho_k < \mu \\[4pt] \Delta_k & \text{if } \mu \le \rho_k < \eta \\[4pt] \min(\gamma_2\,\Delta_k,\; \Delta_{\max}) & \text{if } \rho_k \ge \eta \text{ and step hit boundary} \end{cases}\]

with default values $\gamma_1 = 0.25$ (shrink) and $\gamma_2 = 2$ (expand).

How Retro implements this

The main loop in optimize does exactly the cycle above:

Build quadratic model using the current Hessian (or approximation).
Solve the trust-region subproblem in a subspace (see Chapter 4).
Evaluate $\rho_k$.
Accept or reject; update $\Delta_k$.

All parameters ($\mu$, $\eta$, $\gamma_1$, $\gamma_2$, initial $\Delta$, max $\Delta$) are configurable through RetroOptions:

opts = RetroOptions(
    mu    = 0.25,   # reject threshold
    eta   = 0.75,   # expand threshold
    gamma1 = 0.25,  # shrink factor
    gamma2 = 2.0,   # expand factor
    initial_tr_radius = 1.0,
    max_tr_radius     = 1000.0,
)
result = optimize(prob; options = opts)

Key insight

Trust regions decouple direction (from the model) and step length (from the radius). This makes the method globally convergent under mild assumptions.

Previous ← The Simplest Optimizer · Next → Handling Bounds

2 — Adding Trust Regions

The problem

The idea

Step acceptance

Trust region update

How Retro implements this