Optimization¶

Overview¶

The polykin.math.optimization module provides both univariate and multivariate solvers for smooth objective functions.

	Derivative-free	Derivative-based
Univariate (scalar)	`fmin_brent`	`fmin_secant`
Multivariate	`fmin_nelder_mead`	`fmin_qnewton`

The scalar solvers return an OptimumResult, while the multivariate solvers return a VectorOptimumResult with the estimated minimizer, objective value, convergence message, and iteration counters.

Univariate Function¶

In [1]:

Copied!

import matplotlib.pyplot as plt
import numpy as np

from polykin.math.optimization import fmin_brent, fmin_secant
import matplotlib.pyplot as plt
import numpy as np

from polykin.math.optimization import fmin_brent, fmin_secant

Lennard-Jones Potential¶

The Lennard-Jones potential is a standard one-dimensional test problem with a single attractive well. For $\sigma = \epsilon = 1$, the exact minimum is at $r^\star = 2^{1/6}$, which gives a useful reference when checking the numerical result.

In [2]:

Copied!





def lennard_jones(r):
    sigma = 1.0
    epsilon = 1.0
    return 4 * epsilon * ((sigma/r)**12 - (sigma/r)**6)
def lennard_jones(r):
    sigma = 1.0
    epsilon = 1.0
    return 4 * epsilon * ((sigma/r)**12 - (sigma/r)**6)

In [3]:

Copied!





fig, ax = plt.subplots()
fig.suptitle("Lennard-Jones potential")

r = np.linspace(0.95, 3.0, 400)
r_star = 2 ** (1 / 6)
ax.plot(r, lennard_jones(r), label="Lennard-Jones")
ax.axvline(r_star, color="tab:red", linestyle="--", label=r"$r^\star = 2^{1/6}$")
ax.set_xlabel("r")
ax.set_ylabel("f(r)")
ax.legend();
fig, ax = plt.subplots()
fig.suptitle("Lennard-Jones potential")

r = np.linspace(0.95, 3.0, 400)
r_star = 2 ** (1 / 6)
ax.plot(r, lennard_jones(r), label="Lennard-Jones")
ax.axvline(r_star, color="tab:red", linestyle="--", label=r"$r^\star = 2^{1/6}$")
ax.set_xlabel("r")
ax.set_ylabel("f(r)")
ax.legend();

No description has been provided for this image

When a bracket containing the minimum is available, Brent's method is the safest choice. It does not require derivatives, preserves the bracket, and usually converges rapidly on smooth functions.

In [4]:

Copied!

fmin_brent(lennard_jones, 1.0, 3.0)
fmin_brent(lennard_jones, 1.0, 3.0)

Out[4]:

 method: Brent
success: True
message: |Δx| ≤ tolx
 nfeval: 13
  niter: 13
      x: 1.12246221e+00
      f: -1.00000000e+00

The secant method targets the stationary point through numerical derivatives, so it can be very efficient when the starting guesses are already close to the minimizer. Here we use complex-step differentiation, which is accurate for functions that accept complex inputs.

In [5]:

Copied!

fmin_secant(lennard_jones, 1.0, 1.01, diff_scheme="complex")
fmin_secant(lennard_jones, 1.0, 1.01, diff_scheme="complex")

Out[5]:

 method: Secant
success: True
message: |f'(x)| ≤ tolg
 nfeval: 10
  niter: 8
      x: 1.12246205e+00
      f: -1.00000000e+00
     df: -3.87659504e-09

The trade-off is robustness. If the initial guesses are too far from the well, the derivative-based iteration can leave the attractive basin and fail to locate the minimum.

In [6]:

Copied!

fmin_secant(lennard_jones, 1.5, 1.6, diff_scheme="complex")
fmin_secant(lennard_jones, 1.5, 1.6, diff_scheme="complex")

Out[6]:

 method: Secant
success: False
message: Function value increased from -0.2242077243863605 at x[k-1]=1.6 to -0.10963081822561647 at x[k]=1.8125241675252626.
 nfeval: 3
  niter: 1
      x: 1.81252417e+00
      f: -1.09630818e-01
     df: 3.52378708e-01

The near-minimum secant run converges to the same optimum as Brent's method, but the second run shows why fmin_secant should be treated as a local solver.

Recommendation

use fmin_brent when you can bracket the minimum and want robust convergence;
use fmin_secant when the function is smooth and you already have two good initial guesses.

Multivariate Function¶

In [7]:

Copied!

from polykin.math.optimization import fmin_nelder_mead, fmin_qnewton
from polykin.math.optimization import fmin_nelder_mead, fmin_qnewton

Himmelblau's function¶

Himmelblau's function is a standard two-dimensional test problem with four equivalent local minima. That makes it useful for visualizing how a solver moves through the objective landscape and how strongly the result depends on the starting point.

We start both methods from the same initial guess, $x_0 = (0, 0)$, and record their iterates through a callback so the optimization paths can be plotted on top of the contour map.

In [8]:

Copied!





def himmelblau(z):
    """Himmelblau's function for 2D optimization."""
    x, y = z
    return (x**2 + y - 11)**2 + (x + y**2 - 7)**2

def himmelblau_grad(z):
    """Analytical gradient of Himmelblau's function (for BFGS)."""
    x, y = z
    dx = 4*x*(x**2 + y - 11) + 2*(x + y**2 - 7)
    dy = 2*(x**2 + y - 11) + 4*y*(x + y**2 - 7)
    return np.array([dx, dy])
def himmelblau(z):
    """Himmelblau's function for 2D optimization."""
    x, y = z
    return (x**2 + y - 11)**2 + (x + y**2 - 7)**2

def himmelblau_grad(z):
    """Analytical gradient of Himmelblau's function (for BFGS)."""
    x, y = z
    dx = 4*x*(x**2 + y - 11) + 2*(x + y**2 - 7)
    dy = 2*(x**2 + y - 11) + 4*y*(x + y**2 - 7)
    return np.array([dx, dy])

In [9]:

Copied!





fig, ax = plt.subplots(figsize=(8, 6))
fig.suptitle("Himmelblau's function")

C_exp = np.linspace(-5, 5, 400)
y = C_exp
X, Y = np.meshgrid(C_exp, y)
Z = himmelblau([X, Y])

contours = ax.contour(X, Y, Z, levels=50, cmap="viridis")
fig.colorbar(contours, ax=ax, label="f(x, y)")
ax.set_xlabel("x")
ax.set_ylabel("y");
fig, ax = plt.subplots(figsize=(8, 6))
fig.suptitle("Himmelblau's function")

C_exp = np.linspace(-5, 5, 400)
y = C_exp
X, Y = np.meshgrid(C_exp, y)
Z = himmelblau([X, Y])

contours = ax.contour(X, Y, Z, levels=50, cmap="viridis")
fig.colorbar(contours, ax=ax, label="f(x, y)")
ax.set_xlabel("x")
ax.set_ylabel("y");

fmin_nelder_mead does not use derivatives. This makes the method robust on rough problems, but typically slower than gradient-based alternatives on smooth functions.

The method also accepts a callback function, which we can use to record the solution trajectory.

In [10]:

Copied!





path_nelder_mead = []
def callback_nelder_mead(k, xk, fk):
    path_nelder_mead.append(xk.copy())
    return (False, False)
path_nelder_mead = []
def callback_nelder_mead(k, xk, fk):
    path_nelder_mead.append(xk.copy())
    return (False, False)

In [11]:

Copied!

x0 = [0.0, 0.0]
fmin_nelder_mead(himmelblau, x0, callback=callback_nelder_mead)
x0 = [0.0, 0.0]
fmin_nelder_mead(himmelblau, x0, callback=callback_nelder_mead)

Out[11]:

 method: Nelder-Mead
success: True
message: Function value spread is less than `tolf`.
 nfeval: 107
  niter: 56
      x: [2.99999018 2.00002511]
      f: 9.3584127648721e-09

In [12]:

Copied!





for i, simplex in enumerate(path_nelder_mead):
    simplex_closed = np.vstack([simplex, simplex[0]])
    ax.plot(
        simplex_closed[:, 0],
        simplex_closed[:, 1],
        color="tab:blue",
        alpha=0.5,
        linewidth=0.9,
        label ="Nelder-Mead iterates" if i == 0 else None,
    )
ax.legend(loc="upper left")
fig
for i, simplex in enumerate(path_nelder_mead):
    simplex_closed = np.vstack([simplex, simplex[0]])
    ax.plot(
        simplex_closed[:, 0],
        simplex_closed[:, 1],
        color="tab:blue",
        alpha=0.5,
        linewidth=0.9,
        label ="Nelder-Mead iterates" if i == 0 else None,
    )
ax.legend(loc="upper left")
fig

Out[12]:

fmin_qnewton is a local gradient-based method. If no gradient is supplied, the solver estimates it numerically by forward finite differences, which is convenient but usually more expensive and less accurate than providing an analytical expression.

In [13]:

Copied!





path_qnewton = []
def callback_qnewton(k, xk, fk, gk):
    path_qnewton.append(xk.copy())
    return (False, False)
path_qnewton = []
def callback_qnewton(k, xk, fk, gk):
    path_qnewton.append(xk.copy())
    return (False, False)

In [14]:

Copied!

fmin_qnewton(himmelblau, x0, global_method="dogleg", callback=callback_qnewton)
fmin_qnewton(himmelblau, x0, global_method="dogleg", callback=callback_qnewton)

Out[14]:

 method: Quasi-Newton (Global: Dogleg, BFGS: Factored)
success: True
message: ||∇f(x)*sclf/sclx||∞ ≤ tolg
 nfeval: 44
 ngeval: 0
 nheval: 0
  niter: 13
      x: [2.99999997 1.9999999 ]
      f: 2.6354236176940717e-13
      g: [-2.41161170e-06 -3.53914313e-06]
      L: [[8.57642809 0.        ]
 [2.2562301  5.28706529]]

When an analytical gradient is available, passing it explicitly usually reduces the total work. The optimizer follows the same local strategy, but avoids repeated finite-difference evaluations.

In [15]:

Copied!

fmin_qnewton(himmelblau, x0, grad=himmelblau_grad, global_method="dogleg")
fmin_qnewton(himmelblau, x0, grad=himmelblau_grad, global_method="dogleg")

Out[15]:

 method: Quasi-Newton (Global: Dogleg, BFGS: Factored)
success: True
message: ||∇f(x)*sclf/sclx||∞ ≤ tolg
 nfeval: 16
 ngeval: 14
 nheval: 0
  niter: 13
      x: [2.99999999 1.9999999 ]
      f: 1.8569681700439245e-13
      g: [-2.41465638e-06 -3.54515898e-06]
      L: [[8.57643868 0.        ]
 [2.25622574 5.28702547]]

In [16]:

Copied!





ax.plot(
    [point[0] for point in path_qnewton],
    [point[1] for point in path_qnewton],
    color="tab:red",
    marker="o",
    markersize=3,
    linewidth=1.2,
    label="quasi-Newton iterates",
)
ax.legend(loc="upper left")
fig
ax.plot(
    [point[0] for point in path_qnewton],
    [point[1] for point in path_qnewton],
    color="tab:red",
    marker="o",
    markersize=3,
    linewidth=1.2,
    label="quasi-Newton iterates",
)
ax.legend(loc="upper left")
fig

Out[16]:

Starting from the same initial point, the quasi-Newton method follows a much more direct path than the Nelder–Mead simplex algorithm. As a result, it typically requires far fewer iterations and function evaluations.

Recommendation

use fmin_qnewton for smooth objective functions when you can provide a reasonable initial guess;
provide grad whenever possible, as gradient information usually improves both efficiency and robustness;
use fmin_nelder_mead when the objective function is noisy and derivatives cannot be estimated reliably.

Reaction Network¶

Most optimization problems encountered in chemical engineering are more demanding than the toy examples above because the variables can be strongly coupled, poorly scaled, and only available through a numerical model. As a more representative example, we fit a simple reaction network to concentration data measured at different temperatures.

We consider the series network:

$$ A \xrightarrow{k_1} B \xrightarrow{k_2} C $$

and assume an Arrhenius-like temperature dependence written relative to a reference temperature, $ k = k_0 \exp [-E (1/T - 1/T_0)]$.

In [17]:

Copied!

from scipy.integrate import solve_ivp
from scipy.integrate import solve_ivp

The concentrations are obtained by integrating the transient mass balances for the three species with solve_ivp.

In [18]:

Copied!





def model(teval, C0, T, k0, Ea, T0=300):

    def dCdt(t, C, T, k0, Ea, T0):
        k = k0 * np.exp(-Ea * (1 / T - 1 / T0))
        r1 = k[0] * C[0]
        r2 = k[1] * C[1]
        return np.array([-r1, r1 - r2, r2])

    sol = solve_ivp(
        dCdt,
        (0.0, teval[-1]),
        C0,
        t_eval=teval,
        args=(T, k0, Ea, T0),
        method="LSODA",
        atol=1e-7,
        rtol=1e-7,
    )
    return sol.t, sol.y
def model(teval, C0, T, k0, Ea, T0=300):

    def dCdt(t, C, T, k0, Ea, T0):
        k = k0 * np.exp(-Ea * (1 / T - 1 / T0))
        r1 = k[0] * C[0]
        r2 = k[1] * C[1]
        return np.array([-r1, r1 - r2, r2])

    sol = solve_ivp(
        dCdt,
        (0.0, teval[-1]),
        C0,
        t_eval=teval,
        args=(T, k0, Ea, T0),
        method="LSODA",
        atol=1e-7,
        rtol=1e-7,
    )
    return sol.t, sol.y

The dataset below contains three experiments carried out at different temperatures from the same initial composition.

In [19]:

Copied!





data = {
    "exp1": {
        "T": 280,
        "C0": [1.0, 0.0, 0.0],
        "t": [900.0, 1800.0, 2700.0, 3600.0],
        "C": [
            [0.832, 0.640, 0.494, 0.375],
            [0.206, 0.430, 0.517, 0.713],
            [0.00104, 0.00429, 0.00821, 0.0143],
        ],
    },
    "exp2": {
        "T": 300,
        "C0": [1.0, 0.0, 0.0],
        "t": [900.0, 1800.0, 2700.0, 3600.0],
        "C": [
            [0.424, 0.163, 0.0706, 0.0257],
            [0.665, 0.744, 0.881, 0.597],
            [0.0277, 0.101, 0.177, 0.260],
        ],
    },
    "exp3": {
        "T": 320,
        "C0": [1.0, 0.0, 0.0],
        "t": [900.0, 1800.0, 2700.0, 3600.0],
        "C": [
            [0.0886, 0.00629, 0.000526, 3.12e-05],
            [0.490, 0.301, 0.176, 0.0669],
            [0.329, 0.639, 0.846, 1.000],
        ],
    },
}
data = {
    "exp1": {
        "T": 280,
        "C0": [1.0, 0.0, 0.0],
        "t": [900.0, 1800.0, 2700.0, 3600.0],
        "C": [
            [0.832, 0.640, 0.494, 0.375],
            [0.206, 0.430, 0.517, 0.713],
            [0.00104, 0.00429, 0.00821, 0.0143],
        ],
    },
    "exp2": {
        "T": 300,
        "C0": [1.0, 0.0, 0.0],
        "t": [900.0, 1800.0, 2700.0, 3600.0],
        "C": [
            [0.424, 0.163, 0.0706, 0.0257],
            [0.665, 0.744, 0.881, 0.597],
            [0.0277, 0.101, 0.177, 0.260],
        ],
    },
    "exp3": {
        "T": 320,
        "C0": [1.0, 0.0, 0.0],
        "t": [900.0, 1800.0, 2700.0, 3600.0],
        "C": [
            [0.0886, 0.00629, 0.000526, 3.12e-05],
            [0.490, 0.301, 0.176, 0.0669],
            [0.329, 0.639, 0.846, 1.000],
        ],
    },
}

We estimate the two pre-exponential factors and the two activation parameters by minimizing the sum of squared concentration residuals over all experiments.

In [20]:

Copied!





def sse(x):
    k0 = x[:2]
    Ea = x[2:]
    error = 0.0
    for experiment in data.values():
        teval = experiment["t"]
        C0 = experiment["C0"]
        T = experiment["T"]
        _, y_model = model(teval, C0, T=T, k0=k0, Ea=Ea)
        y_exp = np.array(experiment["C"])
        error += np.sum((y_model - y_exp)**2)
    return error
def sse(x):
    k0 = x[:2]
    Ea = x[2:]
    error = 0.0
    for experiment in data.values():
        teval = experiment["t"]
        C0 = experiment["C0"]
        T = experiment["T"]
        _, y_model = model(teval, C0, T=T, k0=k0, Ea=Ea)
        y_exp = np.array(experiment["C"])
        error += np.sum((y_model - y_exp)**2)
    return error

The initial guess x0 is used to infer characteristic variable magnitudes. It is therefore important to provide the correct order of magnitude for each parameter.

In [21]:

Copied!

x0 = np.array([5e-3, 5e-3, 1e3, 1e3])
x0 = np.array([5e-3, 5e-3, 1e3, 1e3])

We start with fmin_nelder_mead, which is robust against numerical noise although not particularly efficient.

In [22]:

Copied!

sol = fmin_nelder_mead(sse, x0)
sol
sol = fmin_nelder_mead(sse, x0)
sol

Out[22]:

 method: Nelder-Mead
success: True
message: Function value spread is less than `tolf`.
 nfeval: 570
  niter: 329
      x: [1.02365100e-03 1.13579227e-04 5.32813110e+03 9.47936666e+03]
      f: 0.07341888421258223

fmin_qnewton can be more efficient for this smooth problem, but only if the finite-difference gradient is estimated with adequate accuracy.

Because the objective is evaluated through an ODE solver, epsf, tolx, and tolg should be chosen consistently with the numerical accuracy of the model output. If the model is accurate to about $n$ digits, the finite-difference gradient is usually reliable to only about $n/2$ digits. As a practical starting point, set $\mathrm{epsf} \approx 10^{-n}$, $\mathrm{tolx} \approx 10^{-2n/3}$, and $\mathrm{tolg} \gtrsim 10^{-n/3}$. Some trial-and-error adjustment is often required.

In [23]:

Copied!

sol = fmin_qnewton(sse, x0, sclx=[1e3, 1e3, 1e-3, 1e-3], epsf=1e-6, tolx=1e-4, tolg=1e-2)
sol
sol = fmin_qnewton(sse, x0, sclx=[1e3, 1e3, 1e-3, 1e-3], epsf=1e-6, tolx=1e-4, tolg=1e-2)
sol

Out[23]:

 method: Quasi-Newton (Global: Dogleg, BFGS: Factored)
success: True
message: ||∇f(x)*sclf/sclx||∞ ≤ tolg
 nfeval: 279
 ngeval: 0
 nheval: 0
  niter: 46
      x: [1.02324505e-03 1.12680592e-04 5.34496071e+03 9.49832036e+03]
      f: 0.07345033490325709
      g: [-6.11813220e+00  3.07777014e+00  1.68829170e-06 -6.00114125e-07]
      L: [[-1.48004230e+03  0.00000000e+00  0.00000000e+00  0.00000000e+00]
 [-3.43225362e+02  1.09210658e+04  0.00000000e+00  0.00000000e+00]
 [ 2.34053992e-04  7.18201904e-05 -2.05196239e-04  0.00000000e+00]
 [-3.35781736e-05  2.02918950e-04 -1.49243078e-05  1.18333997e-04]]

Finally, we compare the fitted model against the experimental data.

In [24]:

Copied!





def plot_model_fit(data, k0, Ea):

    fig, ax = plt.subplots(len(data), 1, sharex=True, figsize=(7, 2 * len(data)))
    species = ["A", "B", "C"]

    for i, experiment in enumerate(data.values()):
        t_exp = experiment["t"]
        C_exp = experiment["C"]
        T = experiment["T"]
        C0 = experiment["C0"]

        ax[i].set_title(f"Experiment {i + 1} at T={T} K")
        teval = np.linspace(0.0, t_exp[-1], 100)
        t_model, C_model = model(teval, C0, T=T, k0=k0, Ea=Ea)
        for j in range(len(C0)):
            ax[i].scatter(t_exp, C_exp[j], label=species[j])
            ax[i].plot(t_model, C_model[j])

        ax[i].set_ylabel("Concentration")

    ax[0].legend(loc="best")
    ax[-1].set_xlabel("Time (s)")
    fig.tight_layout()
def plot_model_fit(data, k0, Ea):

    fig, ax = plt.subplots(len(data), 1, sharex=True, figsize=(7, 2 * len(data)))
    species = ["A", "B", "C"]

    for i, experiment in enumerate(data.values()):
        t_exp = experiment["t"]
        C_exp = experiment["C"]
        T = experiment["T"]
        C0 = experiment["C0"]

        ax[i].set_title(f"Experiment {i + 1} at T={T} K")
        teval = np.linspace(0.0, t_exp[-1], 100)
        t_model, C_model = model(teval, C0, T=T, k0=k0, Ea=Ea)
        for j in range(len(C0)):
            ax[i].scatter(t_exp, C_exp[j], label=species[j])
            ax[i].plot(t_model, C_model[j])

        ax[i].set_ylabel("Concentration")

    ax[0].legend(loc="best")
    ax[-1].set_xlabel("Time (s)")
    fig.tight_layout()

The fitted profiles capture the main trends across the three temperatures and reproduce the expected consumption of A, transient accumulation of B, and formation of C.

In [25]:

Copied!

plot_model_fit(data, k0=sol.x[:2], Ea=sol.x[2:])
plot_model_fit(data, k0=sol.x[:2], Ea=sol.x[2:])