The code was adapted from Toshiki Watanabe, please check the original page for update and credit.
The base algorithm is SAC discrete [1] for my own research purpose, might add the continuous version later.
The original paper introduces Munchausen trick only on top of DQN, here I try to extend it to actor critic style, this requires exploration on the suitable policy loss.
[1] Christodoulou, Petros. "Soft Actor-Critic for Discrete Action Settings." arXiv preprint arXiv:1910.07207 (2019).
[2] Nino Vieillard, Olivier Pietquin, Matthieu Geist, "Munchausen Reinforcement Learning." NeurIPS (2020).