Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new algorithms #11

Open
1 of 2 tasks
rahulptel opened this issue Jul 16, 2019 · 7 comments
Open
1 of 2 tasks

Add new algorithms #11

rahulptel opened this issue Jul 16, 2019 · 7 comments

Comments

@rahulptel
Copy link
Contributor

rahulptel commented Jul 16, 2019

It would be nice to add the following algorithms:

  • RAINBOW
  • A2C (multiprocessing)

I will submit a PR if I finish any of them.

@seungeunrho
Copy link
Owner

seungeunrho commented Jul 16, 2019

Hi!
I think A2C (synchronous update version of A3C) is good.
What about implementing RAINBOW rather than Double, Dueling DQN?
I think the significance of the code to both Double and Dueling DQN is marginal because they are small variations of DQN in terms of implementation.
In contrast, a simple implementation of the RAINBOW might be helpful for many people.
(Actually, Dueling and Double DQN are 2 components of RAINBOW out of 6)
https://arxiv.org/abs/1710.02298

@rahulptel
Copy link
Contributor Author

Agreed. We can go with RAINBOW.

@seungeunrho
Copy link
Owner

Awesome!

@BDEvan5
Copy link

BDEvan5 commented Jun 11, 2020

MuZero would also be a cool algorithm, it is a bit more complicated with the MCTS but it works very well

@BDEvan5
Copy link

BDEvan5 commented Jun 11, 2020

Also, thanks so much for sharing.
These are great simple implementations for learning and have been very useful.

If you want to try something else, you could also try to implement them in TensorFlow

@ADGEfficiency
Copy link

How about SAC?

@Mahesha999
Copy link

How about Phasic Policy Gradient (PPG) as it gives better results than PPO?
Also an example of using these algorithms for non gaming environment like ones with list, dict etc as observation instead of image frames. I guess that will be easy as we will have to use NN instead of CNN. Still a simple example, may be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants