This is a cheat sheet for the RL-MPC tutorial.
Keep this handy and also feel free to chime in during the tutorial for clarification :-)
Reinforcement learning
\(s, s'\)
State
\(s_t, s_{t+1}\)
\(a\)
Action
\(a_t\)
\(p\)
State transition probability
\(s' \sim p\left( s' \middle| s, a \right)\)
\(r\)
Reward
\(r_t = r(s_t, a_t)\)
\(\pi\)
Policy
\(a \sim \pi(a|s)\), \(a=\pi(s)\)
\(\gamma\)
Discount factor
\(\gamma \in [0,1]\)
\(G_t\)
Discounted return
\(G_t = \sum_{k=0}^\infty \gamma^k r_{t+k}\)
\(Q^\pi\)
State-action value function
\(Q^\pi (s,a) = \mathbb{E} \left[ G_0 | \substack{s_0 = s\\ a_0 = a} \right]\)
\(Q^\star\)
Optimal value function
The above, but better
\(V^\pi\)
Value function
\(V^\pi (s) = \mathbb{E} \left[ G_0 | s_0 = s \right]\)
\(V^\star\)
Optimal value function
\(V^\star (s) = \max_a Q^\star(s,a)\)
\(\pi^\star\)
Optimal policy
\(\pi^\star (s) = \text{arg}\max_a Q^\star (s,a)\)
Control
\(x\)
State
\(x_t\)
\(u\)
(Control) input
\(u_t\)
\(f\)
State transition function
\(x_{t+1} = f(x_t, u_t)\)
\(\ell\)
(Stage) cost
\(\ell(x,u) = x^T M x + u^T R u\)
\(K\)
Gain matrix
\(u = -K x\)
Acronyms
RL
Reinforcement learning
MPC
Model predictive control
LQR
Linear quadratic regulator
PID
Proportional-integral-derivative