1. Matrix Derivatives

Objective Function

  • 1st-Order Gradient

    一般规定为一个 Row Vector

    It is a linear operator mapping into

    敲代码的时候,这样的规定能让 dimension 对得上,省略很多麻烦

    不过在写公式的时候用另一种 Notation 会更方便

  • 2nd-Order Gradient

    Hessian

    Hessian 是 Symmetric Matrix,所以 Notation 无所谓,dimension 总是对的

    可以用它对 objective function 进行 2nd-Order Taylor Expansion

Objective Function

  • 1st-Order Gradient

    一般规定为 的 Vector

    其 transpose 也叫 Jacobian

Notes:

对以上两种 Objective Function 一阶导之规定,即 ,是为了让 Chain Rule 能 work

 

 

 

2. Root Finding

Given , find such that ,其实就是找到 Equilibrium of Dynamics

  • In Continuous Time

    使导数为 0 即可

  • In Discrete Time

    Discrete-Time Dynamics 的一般形式是 ,找到其 fixed point 即可

 

 

2.1 Newton's Method

用于 root finding 的最基础的一种算法

 

2.1.1 Algorithm

假设 objective function 是 smooth 的,root 位于 ,那么

  1. Fit a linear approximation to

  2. Solve for

    • "Descent" (minimization)
    • "Ascent" (maximization)
  3. Update

  4. Repeat Step 1-3 until convergence

Notes:

基于现有的条件,Newton's Method only converges to the closest Fixed Point to the Initial Guess

鬼知道找到的 Root 是 {global / local} {maximum / minimum} 还是 saddle point,反正你自己看着办吧

 

2.1.2 Regularization

2.1.1 的 step 2 中提及了保证 descent 的条件

If everywhere (which means is strongly convex), then we can always find a minimum with Newton's Method

但很遗憾的是,现实中 it is usually not the case for nonlinear problems

解决方案即 regularization,做法如下

  1. Pick a scalar hyperparameter

  2. Check if Hessian is Positive Definite

  3. if not, then

  4. Repeat step 2-3 until is Positive Definite

  • It guarantees descent
  • Also called "Damped Newton"

 

 

2.2 Line Search (Back Tracking)

Newton's Method 的 step size 并不是所谓 adaptive 的,到最后总会有那么一步会 overshoots the minimum

这个方法算是给 Newton's Method 打的补丁,to make sure step agrees with linearization within some tolerance

Specifically, we update and check , until we get a good reduction value ( "Step Size")

 

2.2.0 Direction of Descent

Let be the Direction of Descent,在做 back tracking 的时候用它替代原本的

为了确保 的方向一定是下降,需要检查 是否被满足,这个 condition 的意思大概是,你在 方向上走一小步 ,函数值就会下降

的取值也是个好问题

  • 严格意义上的选择

    Assign

    Then update

  • 更常见的选择

    Assign

    Then update

 

2.2.1 Armijo Condition

or Sufficient Decrease Condition

  1. Set

    • step size / percentage of reduction
    • tolerance
    • 's reduction rate
  2. Check if

  3. Update

    and repeat Step 2-3 until Step 2 is fulfilled

一般各项参数取值为

下图中 acceptable 的部分表示这个 step length 没有 overshoot the minimum

不 acceptable 的部分显然是 overshoot 了,即斜虚线比函数本身还要小

armijo condition

 

2.2.2 Curvature Condition

2.2.1 的图中不难看出只要 够小就能满足 Armijo Condition

但这会导致 descent 的量不够大,而 curvature condition 能防止 过小

where

这个 condition 的要求可以理解为 "The slope at the next iterate is 'less negative' than the current slope"

所以如果这个条件被满足,已知 ,slope 有两种情况

  1. 比 current slope 更接近 0,函数值更接近 minimum
  2. 直接是 positive slope,函数值越过 minimum

curvature condition

还有一种更加严格的 curvature condition 版本

用绝对值限制之后,就只有把 slope 往 0 上靠这一个选择了

 

2.2.3 Wolfe Conditions

Wolfe Conditions 就是把 Armijo Condition 和 Curvature Condition 组合在一起

wolfe condition

  • 弱 Wolfe Condition

  • 强 Wolfe Condition

 

 

 

3. Unconstrained Minimization

Find the minimum of the Objective Function

 

 

3.1 Hessian 正定之必要性

若以 2.1 中的 Newton's Method 为基础算法进行 minimization,需要重点关注 Step 2,两侧同乘

如果对 进行 2nd-Order Taylor Expansion

若不能使 ,即为 Positive Definite,那么显然在 的过程中无法保证 的数值会下降,即达到 Minimum 而不是 Maximum 或者 Saddle Point 这种奇奇怪怪的地方

 

 

3.2 Local Minimum 判断标准

Assume is a Local Minimizer and is continuously differentiable, then

  • 1st-Order Necessary Condition

  • 2nd-Order Necessary Condition

  • 2nd-Order Sufficient Condition

    If this condition is fulfilled, then is a Strict Local Minimizer of

 

 

 

4. Constrained Minimization

Find the minimum of the Objective Function subjected to the following constraints

我们最终要把一个 Constrained Problem 转化为一个 Unconstrained Problem 才能解它

 

 

4.1 Equality Constraints

where

The 1st-Order Necessary Condition to find the minimum is

  1. 只要在 FREE directions 上满足即可

  2. Equality Constraint 规定了哪里是 的 Free Direction

 

4.1.1 Lagrangian

Lagrangian 能把上述的 Constrained Problem 转化为 Unconstrained Problem,以下是推导过程

  • 关于 Equality Constraint

    "What is Free Direction?" 简而言之,当 作了一个很小的位移 后仍能满足 ,那这个位移就在 "Free Direction" 上,即

    进行 1st-Order Taylor Expansion 得到

    Notes: 是个 Jacobian

    由于 ,所以需要满足的其实是

  • 关于 Objective Function

    同样,假设 在 Objective Function 上做了一个很小的位移 ,进行 1st-Order Taylor Expansion 得到

    如果 尚不是 minimum ,那么只要 ,它就还能继续下降。同样,如果 往相反的方向位移,即 ,那么

    只要 ,它也能继续下降

    Tangent Space 是对称的,我懒得解释了你不懂就自己搜搜看吧...

    反正就是,如果你不想让 下降,那么这个 就必须满足

    也就是 要满足

  • Summary

    从上述两个部分推出的结论是

    也就是 都需要 Normal 于移动方向 ,换而言之 must be parallel to

    或者写成

    [ Example ]

    你会看到 minimum 处的 objective function 和 equality constraint 的 gradient 是平行的

    lagrangian example

  • Lagrangian

    Based on this gradient condition, we define Lagrangian as

    such that the 1st-Order Necessary Condition becomes

    or equivalently

 

4.1.2 KKT System

KKT = Karush-Kuhn-Tucker,更常见的相关概念一般是 KKT Conditions,见 4.2.2

目的是用 Newtonian 去解能使 。首先进行 1st-Order Taylor Expansion

since and , then

重新组织一下

写成 Matrix Form,就是 KKT System

 

4.1.3 Gauss-Newton Method

该方法能简化 4.1.2 中提及的 的运算,一般是用 1st-Order Taylor Expansion 来算

The 2nd term is expensive to compute because it is a bloody Tensor!

So we just drop it! 我们不要第二项辣!蛤!蛤!蛤!

Gauss-Newton 会导致 slightly slower convergence than Full-Newton

会要跑更多的 Iterations,但是单个 Iteration 变 cheaper 了

 

 

4.2 Inequality Constraints

 

4.2.1 LICQ

让 KKT Condition 成立的条件

  • The Active Set of any feasible point consists of the equality constraints and the inequality constraints for which
  • Given a point and the Active Set of constraints , we say that the Linear Independence Constraint Qualification (LICQ) holds if the gradients of all the active constraints are linearly independent"

 

4.2.2 KKT Conditions

具有不等约束的优化问题的 1st-Order-Necessary Conditions = Karush-Kuhn-Tucker (KKT) Conditions

  • 完全体 Lagrangian

    两个 Lagrangian Multiplier

    "Suppose that is a local solution and that the LICQ holds at , and that , and are continuously differentiable, then there are Lagrange multipliers and such that the KKT Conditions are satisfied"

  • KKT Conditions

前两个 condition 真没什么好说的,后两个比较 tricky

Complementary Slackness 在满足 Dual Feasibility 的情况下,其意义是 "Either or "

在 evaluate 一个 inequality constraint 的时候,假设 是一个点,会有三种情况:

  1. - feasible point(非边界),因此 constraint inactive ()
  2. - feasible point(边界上),因此constraint active ()
  3. - infeasible point

complementary slackness