# Optimal Control
# Introduction
[Diagram: Reference -> (+) -> (Measured error) -> [Controller] -> (System Input) -> [System] -> System output]
^ |
|---------- [Sensor] <--- (Measured output) -|
# Calculus of Variations
min x ( t ) J ( x ( t ) ) = ∫ t 0 t f L ( t , x ( t ) , x ˙ ( t ) ) d t \min_{x(t)} J(x(t)) = \int_{t_0}^{t_f} L(t, x(t), \dot{x}(t)) dt
x ( t ) min J ( x ( t )) = ∫ t 0 t f L ( t , x ( t ) , x ˙ ( t )) d t
# * Deterministic Optimal Control
Continuous time:
min x ( t ) , u ( t ) J ( x ( t ) , u ( t ) ) = ∫ t 0 t f l ( x ( t ) , u ( t ) ) d t ⏟ "stage cost" + l f ( x ( t f ) ) ⏟ "terminal cost" \min_{x(t), u(t)} J(x(t), u(t)) = \underbrace{\int_{t_0}^{t_f} l(x(t), u(t)) dt}_{\text{"stage cost"}} + \underbrace{l_f(x(t_f))}_{\text{"terminal cost"}}
x ( t ) , u ( t ) min J ( x ( t ) , u ( t )) = "stage cost" ∫ t 0 t f l ( x ( t ) , u ( t )) d t + "terminal cost" l f ( x ( t f ))
s.t. x ˙ ( t ) = f ( x ( t ) , u ( t ) ) \dot{x}(t) = f(x(t), u(t)) x ˙ ( t ) = f ( x ( t ) , u ( t )) ← \leftarrow ← "dynamics constraints"
(possibly other constraints)
"This is an infinite dimensional" problem in the following sense:
[Graph showing a curve with sample points u 1 , u 2 , … , u N u_1, u_2, \dots, u_N u 1 , u 2 , … , u N ]
u 1 : N = [ u 1 u 2 … u N ] u_{1:N} = \begin{bmatrix} u_1 \\ u_2 \\ \dots \\ u_N \end{bmatrix} u 1 : N = u 1 u 2 … u N
u ( t ) = lim N → ∞ u 1 : N u(t) = \lim_{N \to \infty} u_{1:N} u ( t ) = lim N → ∞ u 1 : N
Solutions as open-loop trajectories
There are a handful of problems with analytic solutions but not many
We will focus on the discrete-time setting which leads to tractable algorithms.
# * Discrete time
min x 1 : N , u 1 : N − 1 J ( x 1 : N , u 1 : N − 1 ) = ∑ k = 1 N − 1 l ( x k , u k ) + l f ( x N ) \min_{x_{1:N}, u_{1:N-1}} J(x_{1:N}, u_{1:N-1}) = \sum_{k=1}^{N-1} l(x_k, u_k) + l_f(x_N)
x 1 : N , u 1 : N − 1 min J ( x 1 : N , u 1 : N − 1 ) = k = 1 ∑ N − 1 l ( x k , u k ) + l f ( x N )
s.t. x k + 1 = f ( x k , u k ) x_{k+1} = f(x_k, u_k) x k + 1 = f ( x k , u k )
u m i n ≤ u k ≤ u m a x u_{min} \le u_k \le u_{max} u min ≤ u k ≤ u ma x ← \leftarrow ← "torque limits"
C ( x k ) ≤ 0 C(x_k) \le 0 C ( x k ) ≤ 0 ← \leftarrow ← "Obstacle / safety constraints"
This is now finite dimensional
Samples x k , u k x_k, u_k x k , u k are often called "knot points"
Continuous -> discrete time using integration (e.g., Runge-Kutta etc.)
Discrete -> continuous using interpolation
# * Pontryagin's Minimum Principle
Also called "Maximum Principle" if you maximize a reward.
First-order necessary condition for deterministic optimal control problem.
In discrete time, just a special case of KKT condition.
# KKT Condition
min x f ( x ) \min_x f(x) min x f ( x )
s.t. c ( x ) = 0 : λ c(x) = 0 : \lambda c ( x ) = 0 : λ
g ( x ) ≤ 0 : μ g(x) \le 0 : \mu g ( x ) ≤ 0 : μ
L ( x , λ , μ ) = f ( x ) + λ T c ( x ) + μ T g ( x ) L(x, \lambda, \mu) = f(x) + \lambda^T c(x) + \mu^T g(x) L ( x , λ , μ ) = f ( x ) + λ T c ( x ) + μ T g ( x ) (Lagrangian)
Stationarity: ∇ x L = ∇ x f ( x ) + ( ∂ c ∂ x ) T λ + ( ∂ g ∂ x ) T μ = 0 \nabla_x L = \nabla_x f(x) + (\frac{\partial c}{\partial x})^T \lambda + (\frac{\partial g}{\partial x})^T \mu = 0 ∇ x L = ∇ x f ( x ) + ( ∂ x ∂ c ) T λ + ( ∂ x ∂ g ) T μ = 0
Primal feasibility: c ( x ) = 0 , g ( x ) ≤ 0 c(x) = 0, g(x) \le 0 c ( x ) = 0 , g ( x ) ≤ 0
Dual feasibility: μ ≥ 0 \mu \ge 0 μ ≥ 0
Complementary slackness: - μ i ⋅ g i ( x ) = 0 ∀ i \mu_i \cdot g_i(x) = 0 \quad \forall i μ i ⋅ g i ( x ) = 0 ∀ i
μ ⊙ g ( x ) = 0 \mu \odot g(x) = 0 μ ⊙ g ( x ) = 0
μ T g ( x ) = 0 \mu^T g(x) = 0 μ T g ( x ) = 0
Given:
min x 1 : N , u 1 : N − 1 ∑ k = 1 N − 1 l ( x k , u k ) + l f ( x N ) \min_{x_{1:N}, u_{1:N-1}} \sum_{k=1}^{N-1} l(x_k, u_k) + l_f(x_N)
x 1 : N , u 1 : N − 1 min k = 1 ∑ N − 1 l ( x k , u k ) + l f ( x N )
s.t. x k + 1 = f ( x k , u k ) x_{k+1} = f(x_k, u_k) x k + 1 = f ( x k , u k )
We can form the Lagrangian:
L = ∑ k = 1 N − 1 l ( x k , u k ) + λ k + 1 T ( f ( x k , u k ) − x k + 1 ) + l f ( x N ) L = \sum_{k=1}^{N-1} l(x_k, u_k) + \lambda_{k+1}^T (f(x_k, u_k) - x_{k+1}) + l_f(x_N)
L = k = 1 ∑ N − 1 l ( x k , u k ) + λ k + 1 T ( f ( x k , u k ) − x k + 1 ) + l f ( x N )
This result is usually stated in terms of the "Hamiltonian":
H ( x , u , λ ) = l ( x , u ) + λ T f ( x , u ) H(x, u, \lambda) = l(x, u) + \lambda^T f(x, u)
H ( x , u , λ ) = l ( x , u ) + λ T f ( x , u )
Plug it into L:
L = H ( x 1 , u 1 , λ 2 ) + [ ∑ k = 2 N − 1 H ( x k , u k , λ k + 1 ) − λ k T x k ] + l f ( x N ) − λ N T x N L = H(x_1, u_1, \lambda_2) + \left[ \sum_{k=2}^{N-1} H(x_k, u_k, \lambda_{k+1}) - \lambda_k^T x_k \right] + l_f(x_N) - \lambda_N^T x_N
L = H ( x 1 , u 1 , λ 2 ) + [ k = 2 ∑ N − 1 H ( x k , u k , λ k + 1 ) − λ k T x k ] + l f ( x N ) − λ N T x N
(Note change to indexing)
Take derivatives w.r.t. x and λ \lambda λ :
∂ L ∂ λ k = ∂ H ∂ λ k − x k + 1 = f ( x k , u k ) − x k + 1 = 0 \frac{\partial L}{\partial \lambda_k} = \frac{\partial H}{\partial \lambda_k} - x_{k+1} = f(x_k, u_k) - x_{k+1} = 0 ∂ λ k ∂ L = ∂ λ k ∂ H − x k + 1 = f ( x k , u k ) − x k + 1 = 0
∂ L ∂ x k = ∂ H ∂ x k − λ k T = ∂ l ∂ x k + λ k + 1 T ∂ f ∂ x k − λ k T = 0 \frac{\partial L}{\partial x_k} = \frac{\partial H}{\partial x_k} - \lambda_k^T = \frac{\partial l}{\partial x_k} + \lambda_{k+1}^T \frac{\partial f}{\partial x_k} - \lambda_k^T = 0 ∂ x k ∂ L = ∂ x k ∂ H − λ k T = ∂ x k ∂ l + λ k + 1 T ∂ x k ∂ f − λ k T = 0
∂ L ∂ x N = ∂ l f ∂ x N − λ N T \frac{\partial L}{\partial x_N} = \frac{\partial l_f}{\partial x_N} - \lambda_N^T ∂ x N ∂ L = ∂ x N ∂ l f − λ N T
For u, we write the min explicitly to handle torque limits:
u k = argmin u ~ H ( x k , u ~ , λ k + 1 ) u_k = \text{argmin}_{\tilde{u}} H(x_k, \tilde{u}, \lambda_{k+1}) u k = argmin u ~ H ( x k , u ~ , λ k + 1 )
s.t. u ~ ∈ U \tilde{u} \in U u ~ ∈ U (shorthand for "in feasible set", e.g., u m i n ≤ u ~ ≤ u m a x u_{min} \le \tilde{u} \le u_{max} u min ≤ u ~ ≤ u ma x )
# In summary:
x k + 1 = ∇ λ H ( x k , u k , λ k + 1 ) x_{k+1} = \nabla_\lambda H(x_k, u_k, \lambda_{k+1}) x k + 1 = ∇ λ H ( x k , u k , λ k + 1 ) → \to → Just the dynamics
λ k = ∇ x H ( x k , u k , λ k + 1 ) \lambda_k = \nabla_x H(x_k, u_k, \lambda_{k+1}) λ k = ∇ x H ( x k , u k , λ k + 1 ) → \to → Backwards in time
u k = argmin u ~ H ( x k , u ~ , λ k + 1 ) u_k = \text{argmin}_{\tilde{u}} H(x_k, \tilde{u}, \lambda_{k+1}) u k = argmin u ~ H ( x k , u ~ , λ k + 1 ) s.t. u ~ ∈ U \tilde{u} \in U u ~ ∈ U
λ N = ∂ l f ∂ x N \lambda_N = \frac{\partial l_f}{\partial x_N} λ N = ∂ x N ∂ l f
(Shooting method)
Those can be stated almost identically in continuous time:
x ˙ = ∇ λ H ( x , u , λ ) \dot{x} = \nabla_\lambda H(x, u, \lambda) x ˙ = ∇ λ H ( x , u , λ )
− λ ˙ = ∇ x H ( x , u , λ ) -\dot{\lambda} = \nabla_x H(x, u, \lambda) − λ ˙ = ∇ x H ( x , u , λ )
u = argmin u ~ H ( x , u ~ , λ ) u = \text{argmin}_{\tilde{u}} H(x, \tilde{u}, \lambda) u = argmin u ~ H ( x , u ~ , λ ) s.t. u ~ ∈ U \tilde{u} \in U u ~ ∈ U
λ ( t f ) = ∂ l f ∂ x \lambda(t_f) = \frac{\partial l_f}{\partial x} λ ( t f ) = ∂ x ∂ l f
# * Some Notes:
Historically many algorithms were based on integration the continuous ODE, forward/backward to do gradient descent on u ( t ) u(t) u ( t ) .
Those are called "indirect" and/or "shooting" methods.
In continuous time λ ( t ) \lambda(t) λ ( t ) is called "co-state" trajectory.
These methods have largely fallen out of favor as computers have improved.
# Linear Quadratic Regulator (LQR) Notes
# Table of Contents
LQR Intro
LQR via Shooting (Pontryagin's)
LQR as a QP
Riccati Recursion
# LQR Problem
Any non-linear control local problem → \to → LQR Problem
min x 1 : N , u 1 : N − 1 J = ∑ k = 1 N − 1 ( 1 2 x k T Q k x k + 1 2 u k T R k u k ) + 1 2 x N T Q N x N \min_{x_{1:N}, u_{1:N-1}} J = \sum_{k=1}^{N-1} \left( \frac{1}{2} x_k^T Q_k x_k + \frac{1}{2} u_k^T R_k u_k \right) + \frac{1}{2} x_N^T Q_N x_N
x 1 : N , u 1 : N − 1 min J = k = 1 ∑ N − 1 ( 2 1 x k T Q k x k + 2 1 u k T R k u k ) + 2 1 x N T Q N x N
s.t. x k + 1 = A k x k + B k u k x_{k+1} = A_k x_k + B_k u_k x k + 1 = A k x k + B k u k
Linear Quadratic Regulator
Q ≥ 0 Q \ge 0 Q ≥ 0
R > 0 R > 0 R > 0 (strictly positive definite)
# Characteristics:
Can (locally) approximate many nonlinear problems.
Computationally tractable.
Many extensions e.g., finite vs infinite horizon, stochastic, etc.
"Time invariant" if A k = A , B k = B , Q k = Q , R k = R ∀ t A_k = A, B_k = B, Q_k = Q, R_k = R \quad \forall t A k = A , B k = B , Q k = Q , R k = R ∀ t .
"Time varying" otherwise.
# LQR with Indirect Shooting
x k + 1 = ∇ λ H ( x k , u k , λ k + 1 ) = A x k + B u k x_{k+1} = \nabla_{\lambda} H(x_k, u_k, \lambda_{k+1}) = A x_k + B u_k x k + 1 = ∇ λ H ( x k , u k , λ k + 1 ) = A x k + B u k
λ k = ∇ x H ( x k , u k , λ k + 1 ) = Q x k + A T λ k + 1 \lambda_k = \nabla_x H(x_k, u_k, \lambda_{k+1}) = Q x_k + A^T \lambda_{k+1} λ k = ∇ x H ( x k , u k , λ k + 1 ) = Q x k + A T λ k + 1
λ N = Q N x N \lambda_N = Q_N x_N λ N = Q N x N
u k = argmin u H ( x k , u , λ k + 1 ) = − R − 1 B T λ k + 1 u_k = \text{argmin}_u H(x_k, u, \lambda_{k+1}) = -R^{-1} B^T \lambda_{k+1} u k = argmin u H ( x k , u , λ k + 1 ) = − R − 1 B T λ k + 1
# Procedure:
Start with an initial guess u 1 : N − 1 u_{1:N-1} u 1 : N − 1 trajectory.
Simulate ("rollout" ) to get x 1 : N x_{1:N} x 1 : N .
Backward pass to get λ \lambda λ and Δ u \Delta u Δ u .
Rollout with line search on Δ u \Delta u Δ u .
Go to (3) until convergence.
# Example: "Double Integrator"
x ˙ = [ q ˙ q ¨ ] = [ 0 1 0 0 ] [ q q ˙ ] + [ 0 1 ] u \dot{x} = \begin{bmatrix} \dot{q} \\ \ddot{q} \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} q \\ \dot{q} \end{bmatrix} + \begin{bmatrix} 0 \\ 1 \end{bmatrix} u
x ˙ = [ q ˙ q ¨ ] = [ 0 0 1 0 ] [ q q ˙ ] + [ 0 1 ] u
q ˙ \dot{q} q ˙ : velocity
q ¨ \ddot{q} q ¨ : acceleration
q q q : position
u u u : Force
Think of this as a brick sliding on ice (no friction).
Discrete time:
x k + 1 = [ 1 h 0 1 ] ⏟ A [ q k q ˙ k ] + [ 1 2 h 2 h ] ⏟ B u k x_{k+1} = \underbrace{\begin{bmatrix} 1 & h \\ 0 & 1 \end{bmatrix}}_{A} \begin{bmatrix} q_k \\ \dot{q}_k \end{bmatrix} + \underbrace{\begin{bmatrix} \frac{1}{2}h^2 \\ h \end{bmatrix}}_{B} u_k
x k + 1 = A [ 1 0 h 1 ] [ q k q ˙ k ] + B [ 2 1 h 2 h ] u k
(where h h h is the time step)
# LQR as a QP
Assume x 1 x_1 x 1 (initial state) is given (not a decision variable).
Define z = [ u 1 x 2 u 2 x 3 ⋮ x N ] z = \begin{bmatrix} u_1 \\ x_2 \\ u_2 \\ x_3 \\ \vdots \\ x_N \end{bmatrix} z = u 1 x 2 u 2 x 3 ⋮ x N (decision variable).
Define H = diag ( R 1 , Q 2 , R 2 , Q 3 , … , Q N ) H = \text{diag}(R_1, Q_2, R_2, Q_3, \dots, Q_N) H = diag ( R 1 , Q 2 , R 2 , Q 3 , … , Q N ) such that J = 1 2 z T H z J = \frac{1}{2} z^T H z J = 2 1 z T Hz .
Define C C C and d d d :
[ B 1 − I 0 0 ⋯ A 2 B 2 − I 0 ⋯ ⋱ A N − 1 B N − 1 − I ] ⏟ C Matrix [ u 1 x 2 u 2 ⋮ x N ] ⏟ z = [ − A 1 x 1 0 0 ⋮ 0 ] ⏟ d \underbrace{\begin{bmatrix} B_1 & -I & 0 & 0 & \cdots \\ A_2 & B_2 & -I & 0 & \cdots \\ & & \ddots & & \\ & & A_{N-1} & B_{N-1} & -I \end{bmatrix}}_{C \text{ Matrix}} \underbrace{\begin{bmatrix} u_1 \\ x_2 \\ u_2 \\ \vdots \\ x_N \end{bmatrix}}_{z} = \underbrace{\begin{bmatrix} -A_1 x_1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}}_{d}
C Matrix B 1 A 2 − I B 2 0 − I ⋱ A N − 1 0 0 B N − 1 ⋯ ⋯ − I z u 1 x 2 u 2 ⋮ x N = d − A 1 x 1 0 0 ⋮ 0
Now we can write the LQR problem as a standard QP :
min z 1 2 z T H z \min_z \frac{1}{2} z^T H z
z min 2 1 z T Hz
s.t. C z = d \text{s.t. } Cz = d
s.t. C z = d
# The Lagrangian of this QP is:
L ( z , λ ) = 1 2 z T H z + λ T ( C z − d ) L(z, \lambda) = \frac{1}{2} z^T H z + \lambda^T (Cz - d)
L ( z , λ ) = 2 1 z T Hz + λ T ( C z − d )
# KKT Condition:
∇ z L = H z + C T λ = 0 \nabla_z L = Hz + C^T \lambda = 0 ∇ z L = Hz + C T λ = 0
∇ λ L = C z − d = 0 \nabla_{\lambda} L = Cz - d = 0 ∇ λ L = C z − d = 0
⇒ [ H C T C 0 ] [ z λ ] = [ 0 d ] \Rightarrow \begin{bmatrix} H & C^T \\ C & 0 \end{bmatrix} \begin{bmatrix} z \\ \lambda \end{bmatrix} = \begin{bmatrix} 0 \\ d \end{bmatrix}
⇒ [ H C C T 0 ] [ z λ ] = [ 0 d ]
We get the exact solution by solving one linear system!
Example: Double integrator.
Note: Much better than shooting!
# A closer look at the LQR QP:
The KKT system for LQR is very sparse (lots of zeros) and has a lot of structure.
# Row-by-Row Analysis:
Terminal condition (Row 6):
Q N x N − λ N = 0 → terminal condition from Pontryagin Q_N x_N - \lambda_N = 0 \quad \to \text{terminal condition from Pontryagin} Q N x N − λ N = 0 → terminal condition from Pontryagin
⇒ λ N = Q N x N \Rightarrow \lambda_N = Q_N x_N ⇒ λ N = Q N x N
Control Step (Row 5):
R u 3 + B T λ 4 = R u 3 + B T Q N x 4 = 0 R u_3 + B^T \lambda_4 = R u_3 + B^T Q_N x_4 = 0 R u 3 + B T λ 4 = R u 3 + B T Q N x 4 = 0 (Plugin dynamics for x 4 x_4 x 4 )
⇒ R u 3 + B T Q N ( A x 3 + B u 3 ) = 0 \Rightarrow R u_3 + B^T Q_N (A x_3 + B u_3) = 0 ⇒ R u 3 + B T Q N ( A x 3 + B u 3 ) = 0
⇒ u 3 = − ( R + B T Q N B ) − 1 B T Q N A ⏟ K 3 x 3 \Rightarrow u_3 = -\underbrace{(R + B^T Q_N B)^{-1} B^T Q_N A}_{K_3} x_3 ⇒ u 3 = − K 3 ( R + B T Q N B ) − 1 B T Q N A x 3
Recursion step (Row 4):
Q x 3 − λ 3 + A T λ 4 = 0 Q x_3 - \lambda_3 + A^T \lambda_4 = 0 Q x 3 − λ 3 + A T λ 4 = 0
Plugin λ 4 \lambda_4 λ 4 : Q x 3 − λ 3 + A T Q N x 4 = 0 Q x_3 - \lambda_3 + A^T Q_N x_4 = 0 Q x 3 − λ 3 + A T Q N x 4 = 0
Plugin Dynamics : Q x 3 − λ 3 + A T Q N ( A x 3 + B u 3 ) Q x_3 - \lambda_3 + A^T Q_N (A x_3 + B u_3) Q x 3 − λ 3 + A T Q N ( A x 3 + B u 3 )
Plugin u 3 = K 3 x 3 u_3 = K_3 x_3 u 3 = K 3 x 3 : λ 3 = ( Q + A T Q N ( A − B K 3 ) ) ⏟ P 3 x 3 \lambda_3 = \underbrace{(Q + A^T Q_N (A - B K_3))}_{P_3} x_3 λ 3 = P 3 ( Q + A T Q N ( A − B K 3 )) x 3
# Riccati Equation / Recursion:
Now we have a recursion for K K K and P P P :
P N = Q N P_N = Q_N P N = Q N
K k = ( R + B T P k + 1 B ) − 1 B T P k + 1 A K_k = (R + B^T P_{k+1} B)^{-1} B^T P_{k+1} A K k = ( R + B T P k + 1 B ) − 1 B T P k + 1 A
P k = Q + A T P k + 1 ( A − B K k ) P_k = Q + A^T P_{k+1} (A - B K_k) P k = Q + A T P k + 1 ( A − B K k )
# Summary:
We can solve the QP by doing a backward Riccati followed by a forward rollout to compute x 1 : N x_{1:N} x 1 : N and u 1 : N − 1 u_{1:N-1} u 1 : N − 1 from initial conditions.
General (dense) QP has complexity O ( N 3 ( n + m ) 3 ) O(N^3 (n+m)^3) O ( N 3 ( n + m ) 3 ) .
Riccati solution complexity: O ( ( n + m ) 3 N ) O((n+m)^3 N) O (( n + m ) 3 N ) .