This is a cheat sheet for the RL-MPC tutorial.

Keep this handy and also feel free to chime in during the tutorial for clarification :-)

Reinforcement learning

\(s, s'\)

State

\(s_t, s_{t+1}\)

\(a\)

Action

\(a_t\)

\(p\)

State transition probability

\(s' \sim p\left( s' \middle| s, a \right)\)

\(r\)

Reward

\(r_t = r(s_t, a_t)\)

\(\pi\)

Policy

\(a \sim \pi(a|s)\), \(a=\pi(s)\)

\(\gamma\)

Discount factor

\(\gamma \in [0,1]\)

\(G_t\)

Discounted return

\(G_t = \sum_{k=0}^\infty \gamma^k r_{t+k}\)

\(Q^\pi\)

State-action value function

\(Q^\pi (s,a) = \mathbb{E} \left[ G_0 | \substack{s_0 = s\\ a_0 = a} \right]\)

\(Q^\star\)

Optimal value function

The above, but better

\(V^\pi\)

Value function

\(V^\pi (s) = \mathbb{E} \left[ G_0 | s_0 = s \right]\)

\(V^\star\)

Optimal value function

\(V^\star (s) = \max_a Q^\star(s,a)\)

\(\pi^\star\)

Optimal policy

\(\pi^\star (s) = \text{arg}\max_a Q^\star (s,a)\)

Control

\(x\)

State

\(x_t\)

\(u\)

(Control) input

\(u_t\)

\(f\)

State transition function

\(x_{t+1} = f(x_t, u_t)\)

\(\ell\)

(Stage) cost

\(\ell(x,u) = x^T M x + u^T R u\)

\(K\)

Gain matrix

\(u = -K x\)

Acronyms

RL

Reinforcement learning

MPC

Model predictive control

LQR

Linear quadratic regulator

PID

Proportional-integral-derivative