Optimization¶
Overview¶
The polykin.math.optimization module provides both univariate and multivariate solvers for smooth objective functions.
| Derivative-free | Derivative-based | |
|---|---|---|
| Univariate (scalar) | fmin_brent |
fmin_secant |
| Multivariate | fmin_nelder_mead |
fmin_qnewton |
The scalar solvers return an OptimumResult, while the multivariate solvers return a VectorOptimumResult with the estimated minimizer, objective value, convergence message, and iteration counters.
Univariate Function¶
import matplotlib.pyplot as plt
import numpy as np
from polykin.math.optimization import fmin_brent, fmin_secant
Lennard-Jones Potential¶
The Lennard-Jones potential is a standard one-dimensional test problem with a single attractive well. For $\sigma = \epsilon = 1$, the exact minimum is at $r^\star = 2^{1/6}$, which gives a useful reference when checking the numerical result.
def lennard_jones(r):
sigma = 1.0
epsilon = 1.0
return 4 * epsilon * ((sigma/r)**12 - (sigma/r)**6)
fig, ax = plt.subplots()
fig.suptitle("Lennard-Jones potential")
r = np.linspace(0.95, 3.0, 400)
r_star = 2 ** (1 / 6)
ax.plot(r, lennard_jones(r), label="Lennard-Jones")
ax.axvline(r_star, color="tab:red", linestyle="--", label=r"$r^\star = 2^{1/6}$")
ax.set_xlabel("r")
ax.set_ylabel("f(r)")
ax.legend();
When a bracket containing the minimum is available, Brent's method is the safest choice. It does not require derivatives, preserves the bracket, and usually converges rapidly on smooth functions.
fmin_brent(lennard_jones, 1.0, 3.0)
method: Brent
success: True
message: |Δx| ≤ tolx
nfeval: 13
niter: 13
x: 1.12246221e+00
f: -1.00000000e+00
The secant method targets the stationary point through numerical derivatives, so it can be very efficient when the starting guesses are already close to the minimizer. Here we use complex-step differentiation, which is accurate for functions that accept complex inputs.
fmin_secant(lennard_jones, 1.0, 1.01, diff_scheme="complex")
method: Secant
success: True
message: |f'(x)| ≤ tolg
nfeval: 10
niter: 8
x: 1.12246205e+00
f: -1.00000000e+00
df: -3.87659504e-09
The trade-off is robustness. If the initial guesses are too far from the well, the derivative-based iteration can leave the attractive basin and fail to locate the minimum.
fmin_secant(lennard_jones, 1.5, 1.6, diff_scheme="complex")
method: Secant
success: False
message: Function value increased from -0.2242077243863605 at x[k-1]=1.6 to -0.10963081822561647 at x[k]=1.8125241675252626.
nfeval: 3
niter: 1
x: 1.81252417e+00
f: -1.09630818e-01
df: 3.52378708e-01
The near-minimum secant run converges to the same optimum as Brent's method, but the second run shows why fmin_secant should be treated as a local solver.
Recommendation
- use
fmin_brentwhen you can bracket the minimum and want robust convergence; - use
fmin_secantwhen the function is smooth and you already have two good initial guesses.
Multivariate Function¶
from polykin.math.optimization import fmin_nelder_mead, fmin_qnewton
Himmelblau's function¶
Himmelblau's function is a standard two-dimensional test problem with four equivalent local minima. That makes it useful for visualizing how a solver moves through the objective landscape and how strongly the result depends on the starting point.
We start both methods from the same initial guess, $x_0 = (0, 0)$, and record their iterates through a callback so the optimization paths can be plotted on top of the contour map.
def himmelblau(z):
"""Himmelblau's function for 2D optimization."""
x, y = z
return (x**2 + y - 11)**2 + (x + y**2 - 7)**2
def himmelblau_grad(z):
"""Analytical gradient of Himmelblau's function (for BFGS)."""
x, y = z
dx = 4*x*(x**2 + y - 11) + 2*(x + y**2 - 7)
dy = 2*(x**2 + y - 11) + 4*y*(x + y**2 - 7)
return np.array([dx, dy])
fig, ax = plt.subplots(figsize=(8, 6))
fig.suptitle("Himmelblau's function")
C_exp = np.linspace(-5, 5, 400)
y = C_exp
X, Y = np.meshgrid(C_exp, y)
Z = himmelblau([X, Y])
contours = ax.contour(X, Y, Z, levels=50, cmap="viridis")
fig.colorbar(contours, ax=ax, label="f(x, y)")
ax.set_xlabel("x")
ax.set_ylabel("y");
fmin_nelder_mead does not use derivatives. This makes the method robust on rough problems, but typically slower than gradient-based alternatives on smooth functions.
The method also accepts a callback function, which we can use to record the solution trajectory.
path_nelder_mead = []
def callback_nelder_mead(k, xk, fk):
path_nelder_mead.append(xk.copy())
return (False, False)
x0 = [0.0, 0.0]
fmin_nelder_mead(himmelblau, x0, callback=callback_nelder_mead)
method: Nelder-Mead
success: True
message: Function value spread is less than `tolf`.
nfeval: 107
niter: 56
x: [2.99999018 2.00002511]
f: 9.3584127648721e-09
for i, simplex in enumerate(path_nelder_mead):
simplex_closed = np.vstack([simplex, simplex[0]])
ax.plot(
simplex_closed[:, 0],
simplex_closed[:, 1],
color="tab:blue",
alpha=0.5,
linewidth=0.9,
label ="Nelder-Mead iterates" if i == 0 else None,
)
ax.legend(loc="upper left")
fig
fmin_qnewton is a local gradient-based method. If no gradient is supplied, the solver estimates it numerically by forward finite differences, which is convenient but usually more expensive and less accurate than providing an analytical expression.
path_qnewton = []
def callback_qnewton(k, xk, fk, gk):
path_qnewton.append(xk.copy())
return (False, False)
fmin_qnewton(himmelblau, x0, global_method="dogleg", callback=callback_qnewton)
method: Quasi-Newton (Global: Dogleg, BFGS: Factored)
success: True
message: ||∇f(x)*sclf/sclx||∞ ≤ tolg
nfeval: 44
ngeval: 0
nheval: 0
niter: 13
x: [2.99999997 1.9999999 ]
f: 2.6354236176940717e-13
g: [-2.41161170e-06 -3.53914313e-06]
L: [[8.57642809 0. ]
[2.2562301 5.28706529]]
When an analytical gradient is available, passing it explicitly usually reduces the total work. The optimizer follows the same local strategy, but avoids repeated finite-difference evaluations.
fmin_qnewton(himmelblau, x0, grad=himmelblau_grad, global_method="dogleg")
method: Quasi-Newton (Global: Dogleg, BFGS: Factored)
success: True
message: ||∇f(x)*sclf/sclx||∞ ≤ tolg
nfeval: 16
ngeval: 14
nheval: 0
niter: 13
x: [2.99999999 1.9999999 ]
f: 1.8569681700439245e-13
g: [-2.41465638e-06 -3.54515898e-06]
L: [[8.57643868 0. ]
[2.25622574 5.28702547]]
ax.plot(
[point[0] for point in path_qnewton],
[point[1] for point in path_qnewton],
color="tab:red",
marker="o",
markersize=3,
linewidth=1.2,
label="quasi-Newton iterates",
)
ax.legend(loc="upper left")
fig
Starting from the same initial point, the quasi-Newton method follows a much more direct path than the Nelder–Mead simplex algorithm. As a result, it typically requires far fewer iterations and function evaluations.
Recommendation
- use
fmin_qnewtonfor smooth objective functions when you can provide a reasonable initial guess; - provide
gradwhenever possible, as gradient information usually improves both efficiency and robustness; - use
fmin_nelder_meadwhen the objective function is noisy and derivatives cannot be estimated reliably.
Reaction Network¶
Most optimization problems encountered in chemical engineering are more demanding than the toy examples above because the variables can be strongly coupled, poorly scaled, and only available through a numerical model. As a more representative example, we fit a simple reaction network to concentration data measured at different temperatures.
We consider the series network:
$$ A \xrightarrow{k_1} B \xrightarrow{k_2} C $$
and assume an Arrhenius-like temperature dependence written relative to a reference temperature, $ k = k_0 \exp [-E (1/T - 1/T_0)]$.
from scipy.integrate import solve_ivp
The concentrations are obtained by integrating the transient mass balances for the three species with solve_ivp.
def model(teval, C0, T, k0, Ea, T0=300):
def dCdt(t, C, T, k0, Ea, T0):
k = k0 * np.exp(-Ea * (1 / T - 1 / T0))
r1 = k[0] * C[0]
r2 = k[1] * C[1]
return np.array([-r1, r1 - r2, r2])
sol = solve_ivp(
dCdt,
(0.0, teval[-1]),
C0,
t_eval=teval,
args=(T, k0, Ea, T0),
method="LSODA",
atol=1e-7,
rtol=1e-7,
)
return sol.t, sol.y
The dataset below contains three experiments carried out at different temperatures from the same initial composition.
data = {
"exp1": {
"T": 280,
"C0": [1.0, 0.0, 0.0],
"t": [900.0, 1800.0, 2700.0, 3600.0],
"C": [
[0.832, 0.640, 0.494, 0.375],
[0.206, 0.430, 0.517, 0.713],
[0.00104, 0.00429, 0.00821, 0.0143],
],
},
"exp2": {
"T": 300,
"C0": [1.0, 0.0, 0.0],
"t": [900.0, 1800.0, 2700.0, 3600.0],
"C": [
[0.424, 0.163, 0.0706, 0.0257],
[0.665, 0.744, 0.881, 0.597],
[0.0277, 0.101, 0.177, 0.260],
],
},
"exp3": {
"T": 320,
"C0": [1.0, 0.0, 0.0],
"t": [900.0, 1800.0, 2700.0, 3600.0],
"C": [
[0.0886, 0.00629, 0.000526, 3.12e-05],
[0.490, 0.301, 0.176, 0.0669],
[0.329, 0.639, 0.846, 1.000],
],
},
}
We estimate the two pre-exponential factors and the two activation parameters by minimizing the sum of squared concentration residuals over all experiments.
def sse(x):
k0 = x[:2]
Ea = x[2:]
error = 0.0
for experiment in data.values():
teval = experiment["t"]
C0 = experiment["C0"]
T = experiment["T"]
_, y_model = model(teval, C0, T=T, k0=k0, Ea=Ea)
y_exp = np.array(experiment["C"])
error += np.sum((y_model - y_exp)**2)
return error
The initial guess x0 is used to infer characteristic variable magnitudes. It is therefore important to provide the correct order of magnitude for each parameter.
x0 = np.array([5e-3, 5e-3, 1e3, 1e3])
We start with fmin_nelder_mead, which is robust against numerical noise although not particularly efficient.
sol = fmin_nelder_mead(sse, x0)
sol
method: Nelder-Mead
success: True
message: Function value spread is less than `tolf`.
nfeval: 570
niter: 329
x: [1.02365100e-03 1.13579227e-04 5.32813110e+03 9.47936666e+03]
f: 0.07341888421258223
fmin_qnewton can be more efficient for this smooth problem, but only if the finite-difference gradient is estimated with adequate accuracy.
Because the objective is evaluated through an ODE solver, epsf, tolx, and tolg should be chosen consistently with the numerical accuracy of the model output. If the model is accurate to about $n$ digits, the finite-difference gradient is usually reliable to only about $n/2$ digits. As a practical starting point, set $\mathrm{epsf} \approx 10^{-n}$, $\mathrm{tolx} \approx 10^{-2n/3}$, and $\mathrm{tolg} \gtrsim 10^{-n/3}$. Some trial-and-error adjustment is often required.
sol = fmin_qnewton(sse, x0, sclx=[1e3, 1e3, 1e-3, 1e-3], epsf=1e-6, tolx=1e-4, tolg=1e-2)
sol
method: Quasi-Newton (Global: Dogleg, BFGS: Factored)
success: True
message: ||∇f(x)*sclf/sclx||∞ ≤ tolg
nfeval: 279
ngeval: 0
nheval: 0
niter: 46
x: [1.02324505e-03 1.12680592e-04 5.34496071e+03 9.49832036e+03]
f: 0.07345033490325709
g: [-6.11813220e+00 3.07777014e+00 1.68829170e-06 -6.00114125e-07]
L: [[-1.48004230e+03 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[-3.43225362e+02 1.09210658e+04 0.00000000e+00 0.00000000e+00]
[ 2.34053992e-04 7.18201904e-05 -2.05196239e-04 0.00000000e+00]
[-3.35781736e-05 2.02918950e-04 -1.49243078e-05 1.18333997e-04]]
Finally, we compare the fitted model against the experimental data.
def plot_model_fit(data, k0, Ea):
fig, ax = plt.subplots(len(data), 1, sharex=True, figsize=(7, 2 * len(data)))
species = ["A", "B", "C"]
for i, experiment in enumerate(data.values()):
t_exp = experiment["t"]
C_exp = experiment["C"]
T = experiment["T"]
C0 = experiment["C0"]
ax[i].set_title(f"Experiment {i + 1} at T={T} K")
teval = np.linspace(0.0, t_exp[-1], 100)
t_model, C_model = model(teval, C0, T=T, k0=k0, Ea=Ea)
for j in range(len(C0)):
ax[i].scatter(t_exp, C_exp[j], label=species[j])
ax[i].plot(t_model, C_model[j])
ax[i].set_ylabel("Concentration")
ax[0].legend(loc="best")
ax[-1].set_xlabel("Time (s)")
fig.tight_layout()
The fitted profiles capture the main trends across the three temperatures and reproduce the expected consumption of A, transient accumulation of B, and formation of C.
plot_model_fit(data, k0=sol.x[:2], Ea=sol.x[2:])