-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACER - Examples on continuous action space #143
Comments
Specifically, I would like to know how to use models that do not share parameters across policy and value function |
You are right, we don't have an example script for ACER with a continuous action space for now. It is nice to add one. For now we only have a unit test for ACER with a continuous action space. Here is how the model is defined: pfrl/tests/agents_tests/test_acer.py Lines 388 to 400 in 44bf2e4
pfrl/tests/agents_tests/test_acer.py Lines 409 to 413 in 44bf2e4
So |
I get an error message when I try to use GaussianHeadWithFixedCovariance (as described in the original paper). Is that intentional?
Here is my model definition for error reproduction
|
It seems like a bug in ACER. Can you try commenting out |
I tried that and this causes the program to throw another exception:
|
Maybe Acer is trying to update its parameters before any data was propagated through its neural networks and it causes that exception. |
Thank you for confirming that. It seems like A possible workaround for it would be |
Thank you for a workaround idea, would it be possible for You to create a fix in the nearest future? |
And I dont see learning rate used anywhere in |
I meant the learning rate you set when you make your optimizer, since PyTorch's optimizer allows setting parameter-specific learning rates. I guess you are right about I will try reproducing the issue myself and hopefully fix soon. |
Could you point me out to some examples of how could I use separate PyTorch optimizers or rather just learning rates for my NNs in pfrl. In all of the provided examples, where there are multiple optimizers, they are passed to the training function as separate parameters. What If I wanted to set a different learning rate for my actor, and a different learning rate for my critic and at the same time set learning rate for just |
Here is PyTorch's official doc: https://pytorch.org/docs/stable/optim.html#per-parameter-options |
I hope #145 will resolve the issue. |
I will test it, I also would like to ask if Acer was tested on continuous
action spaces. I've tried training a few agents but did not see any
progress in achieved rewards, but it is very likely I've made some mistakes
in hyperparameterization or learning process setup
|
It is tested on a toy env with continuous actions, but it is not verified that it can reproduce the performance on the continuous tasks in the paper. Here is a sample script to train ACER on OpenAI Gym MuJoCo envs. It seems to work to some extent. It is not tuned much and I cannot guarantee that the hyperparameters etc. are same as those used in the paper. https://github.com/muupan/pfrl/blob/acer-continous-example-tune/examples/mujoco/train_acer_mujoco.py command (with #145 applied):
training log: video with the final model: openaigym.video.1.62730.video000000.mp4 |
That really helped me, thank You very much for all the time You spend debugging code and helping me |
Hello,
I am working on an RL project, where I want to use the ACER algorithm on continuous action space problems (Pybullet environments), but I have difficulties implementing it using Your framework. Would it be possible for You to add an example of how to use this algorithm on this class of problems?
The text was updated successfully, but these errors were encountered: