Optimization

nonlinear programming

unconstrained minimization

first-order methods


conjugate gradient

  • mainly for solving large and sparse linear systems from iterative methods point of view

second-order methods

Newton's method

linear system

  • conjugate gradient method is a Krylov subspace method, by the minimization property, we have ||xkx||A=minpPk,p(0)=1||P(A)(x0x)||A


  • Use ||xkx||A||x0x||AminpPk,p(0)=1maxzδ(A)|p(z)|


    if the spectrum of A is clustered

  • The tightest bound for general matrices is
    ||xkx||A2(κ1κ+1)k||x0x||A
    one can get this result by using Chebychev polynomial, for details see here

line search

  • constant step size rule (no search at all)
  • minimization rule, full relaxation (used in theory only) \[ \alpha_k = \mbox{argmin}_{\alpha > 0} f(x^k +\alpha p^k) \]
  • Armijo rule(sufficient decrease) \[ f(x^k + \alpha p^k) \le f(x^k) + c_1 \alpha \nabla f(x^k)^T p^k\]
  • curvature condition (rule out unacceptably small steps)
    \[ f(x^k + \alpha p^k)^T p^k \ge c_2 \nabla f(x^k)^T p^k\]
  • Wolfe condition (sufficient decrease + curvature condition)
  • Goldstein rule
    \[ c \nabla f(x^k)^T p^k \le f(x^k) - f(x^k + \alpha p^k) \le (1-c) \nabla f(x^k)^T p^k\] where \( c \in (0, \frac{1}{2})\)
    see Numerical Optimization for nice thorough discussion

gradient method


various search direction

  • steepest descent, i.e. negative gradient \( - \nabla f\)
  • general descent direction, i.e. \(p^k = -D^k \nabla f(x^k)\) where \( D^k\) is positive definite

constrained minimization

  • penalty function methods
  • barrier function methods

nonlinear system

global convergence

  • standard assumption for global convergence analysis in gradient method is \[ f \in C^{1,1}_L(\mathbb{R}^n)\] i.e. the first-order derivative of \( f \) is Lipchitz, following which we have the descent lemma
    \[ |f(y) -f(x)- \nabla f(x)^T(y-x)| \le \frac{L}{2}||y-x||^2\]
  • note that in all cases, we can only get \[ \lim_{k\to\infty}||\nabla f(x^k)||=0 \] or even weaker result

steepest descent direction

  • for constant step size rule, minimization rule, Goldstein rule, we have
    \[ \frac{\omega}{L} \displaystyle \sum_{k=0}^{\infty} ||\nabla f(x^k)||^2 \le f(x^0) - f^\star \]
  • rate of convergence result can be obtained only for \[ \min_{1\le i\le k}||\nabla f(x^k)||\]
    see equation (1.2.14) of Introductory Lectures on Convex Optimization

local convergence
standard assumption for local convergence analysis is

  • \( f \in C^{2,2}_M(\mathbb{R}^n)\)
  • \( H(x^\star)\) is positive definite
  • \(x^0\) is sufficiently close to \(x^\star\)

steepest descent direction
for constant step size rule, we have linear convergence rate
\[ ||x^k - x^\star|| = O(\beta^k)\]
see theorem 1.2.4 of Introductory Lectures on Convex Optimization

general descent direction
for Goldstein rule, we have the following Zoutendijk condition
\[ \displaystyle \sum_{k=0}^{\infty} \cos \theta_k ||\nabla f(x^k)||^2 < \infty \]
see theorem 3.2 of Numerical Optimization

local convergence
standard assumption for local convergence analysis is

  • \( f \in C^{2,2}_M(\mathbb{R}^n)\)
  • \( H(x^\star)\) is positive definite
  • \(x^0\) is sufficiently close to \(x^\star\)

standard Newton's method (step size = 1)
we have quadratic convergence rate
\[ ||x^{k+1} - x^\star|| = O(||x^k - x^\star||^2)\]
see theorem 1.2.5 of Introductory Lectures on Convex Optimization

quasi-Newton

convex optimization

  • with rich theory and nice convergence guarantee

convex function

complexity bound
for constant size steepest descent method applied on \( f \in \mathcal{F}^{1,1}_L\), we have \[ f(x^k) - f(x^\star) \le \Theta(\frac{1}{k})\] see theorem 2.1.14 of Introductory Lectures on Convex Optimization

strongly convex function

convergence rate
for constant size steepest descent method applied on \( f \in S^{1,1}_{\mu, L}\), we have linear convergence rate. Note that this result is global convergence. see theorem 2.1.15 of Introductory Lectures on Convex Optimization