Make reward like Dopamine
Everything begins here. The paper propose the idea of distributional RL and prove the distributional Bellman Operation can converge to the expectation and variance of the distribution. Based on the theory, they propose a naive algorthm C51 which use catergorical distribution to approximate the true distribution of the reward.
Let's recall the Q function
where x means the state and a means the action. The prime means the next state and action.
The distributional RL define the Q function as following
where Z means the distribution of the Q value condition on state and action. That means the Q value is NOT a scalar but a distribution. The equation means both side are drawn from the same distribution.
Well, why we need to replace the scalar with a more complex distribution? The main reason is that we can keep as much information as possible. Thus we can update the model more efficiently(sample efficiency). The second reason is we can choose the optimal policy according to the distribution rather than expectation. The third reason is it is similar the mechanism of Dopamine.(an advantage somewhat)
However distributional RL has a small pitfall the Bellman equation does NOT converge on the distribution itself but expectation and variance.
Reveal the relation bewteen distributional RL and dopamine.
It propose QR-DQN.
It propose IQN.
Distributional RL improve the exploration strategy.
The paper propose DSAC which use a distribution to estimate the Q function with SAC.
Risk-sensitive RL.
-
强化学习(RL)中有哪些重要的理论结果? 微軟亞洲研究院回答
Give a series of papers and introdution to the recent research results of Distributional RL and the relation between Distributional RL and safe-RL.
-
Distributional RL - Simple Machine Learning
A great tutorial for Distributional RL. Gives a great explaination.
-
Distributional Reinforcement Learning — Part 1 (C51 and QR-DQN)
-
Towards Structural Risk Minimization for RL - Emma Brunskil
A speech about the intersection of Distributional RL and Safety.
-
Distributional Reinforcement Learning: A Talk by Deepmind
A talk about Distributional RL
-
IFT 6085 - Lecture 19 Basic results on reinforcement learning
Give a detailed proof for the convergence property of distributional RL.
-
How Does Value-Based Reinforcement Learning Find the Optimal Policy?
Give a very detailed proof for the convergence propertied of RL and Distributional RL.