Skip to content

Latest commit

 

History

History
90 lines (49 loc) · 4.35 KB

File metadata and controls

90 lines (49 loc) · 4.35 KB

Distributional RL

Make reward like Dopamine


A Distributional Perspective on Reinforcement Learning

Everything begins here. The paper propose the idea of distributional RL and prove the distributional Bellman Operation can converge to the expectation and variance of the distribution. Based on the theory, they propose a naive algorthm C51 which use catergorical distribution to approximate the true distribution of the reward.

Let's recall the Q function

where x means the state and a means the action. The prime means the next state and action.

The distributional RL define the Q function as following

where Z means the distribution of the Q value condition on state and action. That means the Q value is NOT a scalar but a distribution. The equation means both side are drawn from the same distribution.

Well, why we need to replace the scalar with a more complex distribution? The main reason is that we can keep as much information as possible. Thus we can update the model more efficiently(sample efficiency). The second reason is we can choose the optimal policy according to the distribution rather than expectation. The third reason is it is similar the mechanism of Dopamine.(an advantage somewhat)

However distributional RL has a small pitfall the Bellman equation does NOT converge on the distribution itself but expectation and variance.

A distributional code for value in dopaminebased reinforcement learning

Blog

Reveal the relation bewteen distributional RL and dopamine.

Distributional Reinforcement Learning with Quantile Regression

It propose QR-DQN.

Implicit Quantile Networks for Distributional Reinforcement Learning

It propose IQN.

Statistics and Samples in Distributional Reinforcement Learning

Distributional Reinforcement Learning for Efficient Exploration

Distributional RL improve the exploration strategy.

DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning

The paper propose DSAC which use a distribution to estimate the Q function with SAC.

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

Risk-sensitive RL.

A Comparative Analysis of Expected and Distributional Reinforcement Learning

Reference: