# Optimal Control

# Introduction

[Diagram: Reference -> (+) -> (Measured error) -> [Controller] -> (System Input) -> [System] -> System output]
^ |
|---------- [Sensor] <--- (Measured output) -|

# Calculus of Variations

minx(t)J(x(t))=t0tfL(t,x(t),x˙(t))dt\min_{x(t)} J(x(t)) = \int_{t_0}^{t_f} L(t, x(t), \dot{x}(t)) dt

# * Deterministic Optimal Control

  • Continuous time:

    minx(t),u(t)J(x(t),u(t))=t0tfl(x(t),u(t))dt"stage cost"+lf(x(tf))"terminal cost"\min_{x(t), u(t)} J(x(t), u(t)) = \underbrace{\int_{t_0}^{t_f} l(x(t), u(t)) dt}_{\text{"stage cost"}} + \underbrace{l_f(x(t_f))}_{\text{"terminal cost"}}

    s.t. x˙(t)=f(x(t),u(t))\dot{x}(t) = f(x(t), u(t)) \leftarrow "dynamics constraints"
    (possibly other constraints)

    • "This is an infinite dimensional" problem in the following sense:
      [Graph showing a curve with sample points u1,u2,,uNu_1, u_2, \dots, u_N]
      u1:N=[u1u2uN]u_{1:N} = \begin{bmatrix} u_1 \\ u_2 \\ \dots \\ u_N \end{bmatrix}
      u(t)=limNu1:Nu(t) = \lim_{N \to \infty} u_{1:N}
  • Solutions as open-loop trajectories

  • There are a handful of problems with analytic solutions but not many

  • We will focus on the discrete-time setting which leads to tractable algorithms.

# * Discrete time

minx1:N,u1:N1J(x1:N,u1:N1)=k=1N1l(xk,uk)+lf(xN)\min_{x_{1:N}, u_{1:N-1}} J(x_{1:N}, u_{1:N-1}) = \sum_{k=1}^{N-1} l(x_k, u_k) + l_f(x_N)

s.t. xk+1=f(xk,uk)x_{k+1} = f(x_k, u_k)
uminukumaxu_{min} \le u_k \le u_{max} \leftarrow "torque limits"
C(xk)0C(x_k) \le 0 \leftarrow "Obstacle / safety constraints"

  • This is now finite dimensional
  • Samples xk,ukx_k, u_k are often called "knot points"
  • Continuous -> discrete time using integration (e.g., Runge-Kutta etc.)
  • Discrete -> continuous using interpolation

# * Pontryagin's Minimum Principle

  • Also called "Maximum Principle" if you maximize a reward.
  • First-order necessary condition for deterministic optimal control problem.
  • In discrete time, just a special case of KKT condition.

# KKT Condition

minxf(x)\min_x f(x)
s.t. c(x)=0:λc(x) = 0 : \lambda
g(x)0:μg(x) \le 0 : \mu

L(x,λ,μ)=f(x)+λTc(x)+μTg(x)L(x, \lambda, \mu) = f(x) + \lambda^T c(x) + \mu^T g(x) (Lagrangian)

  • Stationarity: xL=xf(x)+(cx)Tλ+(gx)Tμ=0\nabla_x L = \nabla_x f(x) + (\frac{\partial c}{\partial x})^T \lambda + (\frac{\partial g}{\partial x})^T \mu = 0
  • Primal feasibility: c(x)=0,g(x)0c(x) = 0, g(x) \le 0
  • Dual feasibility: μ0\mu \ge 0
  • Complementary slackness: - μigi(x)=0i\mu_i \cdot g_i(x) = 0 \quad \forall i
    • μg(x)=0\mu \odot g(x) = 0
    • μTg(x)=0\mu^T g(x) = 0

  • Given:

    minx1:N,u1:N1k=1N1l(xk,uk)+lf(xN)\min_{x_{1:N}, u_{1:N-1}} \sum_{k=1}^{N-1} l(x_k, u_k) + l_f(x_N)

    s.t. xk+1=f(xk,uk)x_{k+1} = f(x_k, u_k)

  • We can form the Lagrangian:

    L=k=1N1l(xk,uk)+λk+1T(f(xk,uk)xk+1)+lf(xN)L = \sum_{k=1}^{N-1} l(x_k, u_k) + \lambda_{k+1}^T (f(x_k, u_k) - x_{k+1}) + l_f(x_N)

  • This result is usually stated in terms of the "Hamiltonian":

    H(x,u,λ)=l(x,u)+λTf(x,u)H(x, u, \lambda) = l(x, u) + \lambda^T f(x, u)

  • Plug it into L:

    L=H(x1,u1,λ2)+[k=2N1H(xk,uk,λk+1)λkTxk]+lf(xN)λNTxNL = H(x_1, u_1, \lambda_2) + \left[ \sum_{k=2}^{N-1} H(x_k, u_k, \lambda_{k+1}) - \lambda_k^T x_k \right] + l_f(x_N) - \lambda_N^T x_N

    (Note change to indexing)

  • Take derivatives w.r.t. x and λ\lambda:
    Lλk=Hλkxk+1=f(xk,uk)xk+1=0\frac{\partial L}{\partial \lambda_k} = \frac{\partial H}{\partial \lambda_k} - x_{k+1} = f(x_k, u_k) - x_{k+1} = 0
    Lxk=HxkλkT=lxk+λk+1TfxkλkT=0\frac{\partial L}{\partial x_k} = \frac{\partial H}{\partial x_k} - \lambda_k^T = \frac{\partial l}{\partial x_k} + \lambda_{k+1}^T \frac{\partial f}{\partial x_k} - \lambda_k^T = 0
    LxN=lfxNλNT\frac{\partial L}{\partial x_N} = \frac{\partial l_f}{\partial x_N} - \lambda_N^T

  • For u, we write the min explicitly to handle torque limits:
    uk=argminu~H(xk,u~,λk+1)u_k = \text{argmin}_{\tilde{u}} H(x_k, \tilde{u}, \lambda_{k+1})
    s.t. u~U\tilde{u} \in U (shorthand for "in feasible set", e.g., uminu~umaxu_{min} \le \tilde{u} \le u_{max})

# In summary:

  • xk+1=λH(xk,uk,λk+1)x_{k+1} = \nabla_\lambda H(x_k, u_k, \lambda_{k+1}) \to Just the dynamics

  • λk=xH(xk,uk,λk+1)\lambda_k = \nabla_x H(x_k, u_k, \lambda_{k+1}) \to Backwards in time

  • uk=argminu~H(xk,u~,λk+1)u_k = \text{argmin}_{\tilde{u}} H(x_k, \tilde{u}, \lambda_{k+1}) s.t. u~U\tilde{u} \in U

  • λN=lfxN\lambda_N = \frac{\partial l_f}{\partial x_N}
    (Shooting method)

  • Those can be stated almost identically in continuous time:

    • x˙=λH(x,u,λ)\dot{x} = \nabla_\lambda H(x, u, \lambda)
    • λ˙=xH(x,u,λ)-\dot{\lambda} = \nabla_x H(x, u, \lambda)
    • u=argminu~H(x,u~,λ)u = \text{argmin}_{\tilde{u}} H(x, \tilde{u}, \lambda) s.t. u~U\tilde{u} \in U
    • λ(tf)=lfx\lambda(t_f) = \frac{\partial l_f}{\partial x}

# * Some Notes:

  • Historically many algorithms were based on integration the continuous ODE, forward/backward to do gradient descent on u(t)u(t).
  • Those are called "indirect" and/or "shooting" methods.
  • In continuous time λ(t)\lambda(t) is called "co-state" trajectory.
  • These methods have largely fallen out of favor as computers have improved.

# Linear Quadratic Regulator (LQR) Notes

# Table of Contents

  • LQR Intro
  • LQR via Shooting (Pontryagin's)
  • LQR as a QP
  • Riccati Recursion

# LQR Problem

Any non-linear control local problem \to LQR Problem

minx1:N,u1:N1J=k=1N1(12xkTQkxk+12ukTRkuk)+12xNTQNxN\min_{x_{1:N}, u_{1:N-1}} J = \sum_{k=1}^{N-1} \left( \frac{1}{2} x_k^T Q_k x_k + \frac{1}{2} u_k^T R_k u_k \right) + \frac{1}{2} x_N^T Q_N x_N

s.t. xk+1=Akxk+Bkukx_{k+1} = A_k x_k + B_k u_k

Linear Quadratic Regulator

  • Q0Q \ge 0
  • R>0R > 0 (strictly positive definite)

# Characteristics:

  • Can (locally) approximate many nonlinear problems.
  • Computationally tractable.
  • Many extensions e.g., finite vs infinite horizon, stochastic, etc.
  • "Time invariant" if Ak=A,Bk=B,Qk=Q,Rk=RtA_k = A, B_k = B, Q_k = Q, R_k = R \quad \forall t.
  • "Time varying" otherwise.

# LQR with Indirect Shooting

  • xk+1=λH(xk,uk,λk+1)=Axk+Bukx_{k+1} = \nabla_{\lambda} H(x_k, u_k, \lambda_{k+1}) = A x_k + B u_k
  • λk=xH(xk,uk,λk+1)=Qxk+ATλk+1\lambda_k = \nabla_x H(x_k, u_k, \lambda_{k+1}) = Q x_k + A^T \lambda_{k+1}
  • λN=QNxN\lambda_N = Q_N x_N
  • uk=argminuH(xk,u,λk+1)=R1BTλk+1u_k = \text{argmin}_u H(x_k, u, \lambda_{k+1}) = -R^{-1} B^T \lambda_{k+1}

# Procedure:

  1. Start with an initial guess u1:N1u_{1:N-1} trajectory.
  2. Simulate ("rollout") to get x1:Nx_{1:N}.
  3. Backward pass to get λ\lambda and Δu\Delta u.
  4. Rollout with line search on Δu\Delta u.
  5. Go to (3) until convergence.

# Example: "Double Integrator"

x˙=[q˙q¨]=[0100][qq˙]+[01]u\dot{x} = \begin{bmatrix} \dot{q} \\ \ddot{q} \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} q \\ \dot{q} \end{bmatrix} + \begin{bmatrix} 0 \\ 1 \end{bmatrix} u

  • q˙\dot{q}: velocity
  • q¨\ddot{q}: acceleration
  • qq: position
  • uu: Force

Think of this as a brick sliding on ice (no friction).

Discrete time:

xk+1=[1h01]A[qkq˙k]+[12h2h]Bukx_{k+1} = \underbrace{\begin{bmatrix} 1 & h \\ 0 & 1 \end{bmatrix}}_{A} \begin{bmatrix} q_k \\ \dot{q}_k \end{bmatrix} + \underbrace{\begin{bmatrix} \frac{1}{2}h^2 \\ h \end{bmatrix}}_{B} u_k

(where hh is the time step)


# LQR as a QP

  • Assume x1x_1 (initial state) is given (not a decision variable).
  • Define z=[u1x2u2x3xN]z = \begin{bmatrix} u_1 \\ x_2 \\ u_2 \\ x_3 \\ \vdots \\ x_N \end{bmatrix} (decision variable).
  • Define H=diag(R1,Q2,R2,Q3,,QN)H = \text{diag}(R_1, Q_2, R_2, Q_3, \dots, Q_N) such that J=12zTHzJ = \frac{1}{2} z^T H z.
  • Define CC and dd:

[B1I00A2B2I0AN1BN1I]C Matrix[u1x2u2xN]z=[A1x1000]d\underbrace{\begin{bmatrix} B_1 & -I & 0 & 0 & \cdots \\ A_2 & B_2 & -I & 0 & \cdots \\ & & \ddots & & \\ & & A_{N-1} & B_{N-1} & -I \end{bmatrix}}_{C \text{ Matrix}} \underbrace{\begin{bmatrix} u_1 \\ x_2 \\ u_2 \\ \vdots \\ x_N \end{bmatrix}}_{z} = \underbrace{\begin{bmatrix} -A_1 x_1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}}_{d}

Now we can write the LQR problem as a standard QP:

minz12zTHz\min_z \frac{1}{2} z^T H z

s.t. Cz=d\text{s.t. } Cz = d

# The Lagrangian of this QP is:

L(z,λ)=12zTHz+λT(Czd)L(z, \lambda) = \frac{1}{2} z^T H z + \lambda^T (Cz - d)

# KKT Condition:

  • zL=Hz+CTλ=0\nabla_z L = Hz + C^T \lambda = 0
  • λL=Czd=0\nabla_{\lambda} L = Cz - d = 0

[HCTC0][zλ]=[0d]\Rightarrow \begin{bmatrix} H & C^T \\ C & 0 \end{bmatrix} \begin{bmatrix} z \\ \lambda \end{bmatrix} = \begin{bmatrix} 0 \\ d \end{bmatrix}

We get the exact solution by solving one linear system!

  • Example: Double integrator.
  • Note: Much better than shooting!

# A closer look at the LQR QP:

  • The KKT system for LQR is very sparse (lots of zeros) and has a lot of structure.

# Row-by-Row Analysis:

  • Terminal condition (Row 6):
    QNxNλN=0terminal condition from PontryaginQ_N x_N - \lambda_N = 0 \quad \to \text{terminal condition from Pontryagin}
    λN=QNxN\Rightarrow \lambda_N = Q_N x_N

  • Control Step (Row 5):
    Ru3+BTλ4=Ru3+BTQNx4=0R u_3 + B^T \lambda_4 = R u_3 + B^T Q_N x_4 = 0 (Plugin dynamics for x4x_4)
    Ru3+BTQN(Ax3+Bu3)=0\Rightarrow R u_3 + B^T Q_N (A x_3 + B u_3) = 0
    u3=(R+BTQNB)1BTQNAK3x3\Rightarrow u_3 = -\underbrace{(R + B^T Q_N B)^{-1} B^T Q_N A}_{K_3} x_3

  • Recursion step (Row 4):
    Qx3λ3+ATλ4=0Q x_3 - \lambda_3 + A^T \lambda_4 = 0
    Plugin λ4\lambda_4: Qx3λ3+ATQNx4=0Q x_3 - \lambda_3 + A^T Q_N x_4 = 0
    Plugin Dynamics: Qx3λ3+ATQN(Ax3+Bu3)Q x_3 - \lambda_3 + A^T Q_N (A x_3 + B u_3)
    Plugin u3=K3x3u_3 = K_3 x_3: λ3=(Q+ATQN(ABK3))P3x3\lambda_3 = \underbrace{(Q + A^T Q_N (A - B K_3))}_{P_3} x_3

# Riccati Equation / Recursion:

Now we have a recursion for KK and PP:

  1. PN=QNP_N = Q_N
  2. Kk=(R+BTPk+1B)1BTPk+1AK_k = (R + B^T P_{k+1} B)^{-1} B^T P_{k+1} A
  3. Pk=Q+ATPk+1(ABKk)P_k = Q + A^T P_{k+1} (A - B K_k)

# Summary:

  • We can solve the QP by doing a backward Riccati followed by a forward rollout to compute x1:Nx_{1:N} and u1:N1u_{1:N-1} from initial conditions.
  • General (dense) QP has complexity O(N3(n+m)3)O(N^3 (n+m)^3).
  • Riccati solution complexity: O((n+m)3N)O((n+m)^3 N).