Optimization
nonlinear programming
unconstrained minimization
first-order methods
conjugate gradient
- mainly for solving large and sparse linear systems from iterative methods point of view
second-order methods
Newton's method
linear system
conjugate gradient method is a Krylov subspace method, by the minimization property, we have ||xk−x⋆||A=minp∈Pk,p(0)=1||P(A)(x0−x⋆)||A
Use ||xk−x⋆||A≤||x0−x⋆||Aminp∈Pk,p(0)=1maxz∈δ(A)|p(z)|
if the spectrum of A is clustered
- The tightest bound for general matrices is
||xk−x⋆||A≤2(√κ−1√κ+1)k||x0−x⋆||A
one can get this result by using Chebychev polynomial, for details see here
line search
- constant step size rule (no search at all)
- minimization rule, full relaxation (used in theory only) \[ \alpha_k = \mbox{argmin}_{\alpha > 0} f(x^k +\alpha p^k) \]
- Armijo rule(sufficient decrease) \[ f(x^k + \alpha p^k) \le f(x^k) + c_1 \alpha \nabla f(x^k)^T p^k\]
- curvature condition (rule out unacceptably small steps)
\[ f(x^k + \alpha p^k)^T p^k \ge c_2 \nabla f(x^k)^T p^k\] - Wolfe condition (sufficient decrease + curvature condition)
- Goldstein rule
\[ c \nabla f(x^k)^T p^k \le f(x^k) - f(x^k + \alpha p^k) \le (1-c) \nabla f(x^k)^T p^k\] where \( c \in (0, \frac{1}{2})\)
see Numerical Optimization for nice thorough discussion
gradient method
various search direction
- steepest descent, i.e. negative gradient \( - \nabla f\)
- general descent direction, i.e. \(p^k = -D^k \nabla f(x^k)\) where \( D^k\) is positive definite
constrained minimization
- penalty function methods
- barrier function methods
nonlinear system
global convergence
- standard assumption for global convergence analysis in gradient method is \[ f \in C^{1,1}_L(\mathbb{R}^n)\] i.e. the first-order derivative of \( f \) is Lipchitz, following which we have the descent lemma
\[ |f(y) -f(x)- \nabla f(x)^T(y-x)| \le \frac{L}{2}||y-x||^2\] - note that in all cases, we can only get \[ \lim_{k\to\infty}||\nabla f(x^k)||=0 \] or even weaker result
steepest descent direction
- for constant step size rule, minimization rule, Goldstein rule, we have
\[ \frac{\omega}{L} \displaystyle \sum_{k=0}^{\infty} ||\nabla f(x^k)||^2 \le f(x^0) - f^\star \] - rate of convergence result can be obtained only for \[ \min_{1\le i\le k}||\nabla f(x^k)||\]
see equation (1.2.14) of Introductory Lectures on Convex Optimization
local convergence
standard assumption for local convergence analysis is
- \( f \in C^{2,2}_M(\mathbb{R}^n)\)
- \( H(x^\star)\) is positive definite
- \(x^0\) is sufficiently close to \(x^\star\)
steepest descent direction
for constant step size rule, we have linear convergence rate
\[ ||x^k - x^\star|| = O(\beta^k)\]
see theorem 1.2.4 of Introductory Lectures on Convex Optimization
general descent direction
for Goldstein rule, we have the following Zoutendijk condition
\[ \displaystyle \sum_{k=0}^{\infty} \cos \theta_k ||\nabla f(x^k)||^2 < \infty \]
see theorem 3.2 of Numerical Optimization
local convergence
standard assumption for local convergence analysis is
- \( f \in C^{2,2}_M(\mathbb{R}^n)\)
- \( H(x^\star)\) is positive definite
- \(x^0\) is sufficiently close to \(x^\star\)
standard Newton's method (step size = 1)
we have quadratic convergence rate
\[ ||x^{k+1} - x^\star|| = O(||x^k - x^\star||^2)\]
see theorem 1.2.5 of Introductory Lectures on Convex Optimization
quasi-Newton
convex optimization
- with rich theory and nice convergence guarantee
convex function
- several inequalities from Introductory Lectures on Convex Optimization
- any practical first-order method applied on convex function has lower complexity bound at initial stage \[ f(x^k) - f(x^\star) \ge \Theta(\frac{1}{k^2})\]see theorem 2.1.7 of Introductory Lectures on Convex Optimization
complexity bound
for constant size steepest descent method applied on \( f \in \mathcal{F}^{1,1}_L\), we have \[ f(x^k) - f(x^\star) \le \Theta(\frac{1}{k})\] see theorem 2.1.14 of Introductory Lectures on Convex Optimization
strongly convex function
convergence rate
for constant size steepest descent method applied on \( f \in S^{1,1}_{\mu, L}\), we have linear convergence rate. Note that this result is global convergence. see theorem 2.1.15 of Introductory Lectures on Convex Optimization