Internals — Cache & Allocation Design
Retro's inner loop is designed to be allocation-free after the initial setup. This page documents how.
The RetroCache struct
All temporary vectors and matrices are pre-allocated in a single RetroCache{T} object, created once at the start of optimize:
cache = RetroCache{Float64}(n)Fields
| Field | Size | Purpose |
|---|---|---|
x_trial | $n$ | Candidate next iterate |
g | $n$ | Current gradient |
p | $n$ | Current step |
g_prev | $n$ | Previous gradient (quasi-Newton) |
x_prev | $n$ | Previous iterate (quasi-Newton) |
r | $n$ | CG residual |
d | $n$ | CG search direction |
Hd | $n$ | Hessian–vector product workspace |
s | $n$ | Step difference $x_k - x_{k-1}$ |
y | $n$ | Gradient difference $g_k - g_{k-1}$ |
tmp | $n$ | General scratch vector |
v1 | $n$ | First subspace basis vector |
v2 | $n$ | Second subspace basis vector |
scaled_g | $n$ | Gradient after Coleman–Li scaling |
scaling | $n$ | Diagonal scaling vector |
B | $n \times n$ | BFGS approximation matrix |
Bs | $n$ | Workspace for $B \cdot s$ |
f_calls | scalar | Counter |
g_calls | scalar | Counter |
h_calls | scalar | Counter |
Total memory: $\approx 15n + n^2$ floats (dominated by the $B$ matrix for BFGS).
Why this matters
In a typical trust-region iteration, the hot path is:
update_hessian! → build_subspace! → solve_subspace_tr! → apply_reflective_bounds!All of these operate on cache fields via in-place operations (mul!, @., copy!, dot). No Vector or Matrix is allocated on the heap during this path.
The one remaining allocation
compute_trust_region_step! currently does
original_g = copy(cache.g)to save and restore the gradient around the scaled-gradient temporary. This allocates one $n$-vector per iteration. A future fix is to add a dedicated g_saved field to RetroCache.
StaticArrays in the 2-D subspace
The TwoDimSubspace state stores the projected gradient and Hessian as
g2d::SVector{2,T}
H2d::SMatrix{2,2,T,4}
p2d::SVector{2,T}These are stack-allocated by the compiler — zero heap allocations for the inner eigenvalue solve.
DI preparation objects
ADObjectiveFunction stores two DifferentiationInterface prep objects:
prep_g— fromprepare_gradient(f, backend, x0)prep_h— fromprepare_hessian(f, backend, x0)
These are computed once at construction and reused on every call to gradient!, hessian!, value_and_gradient!, etc. DI uses the prep to cache tapes, chunk sizes, and buffer layouts so that subsequent evaluations avoid internal allocations.
Counters
The f_calls, g_calls, and h_calls fields track the number of evaluations. Combined calls like value_and_gradient! increment both f_calls and g_calls. These counters are reported in the final RetroResult.
Guidelines for contributors
- Never allocate in the hot loop. Use cache fields or local
SVector/SMatrix. - Add new scratch vectors to
RetroCacherather than allocating temporaries. - Use
@.broadcasting for element-wise operations — it fuses and avoids temporaries. - Prefer
mul!(C, A, B)overC = A * Bfor matrix–vector products. - Test with
@allocatedin unit tests to catch regressions.