-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Real recurrent policy supported #383
Comments
Hi @yangysc , thanks for testing the RNN. The shared network from the spec [2019-07-14 22:53:07,321 PID:73583 INFO __init__.py log_summary] Trial 0 session 0 ppo_rnn_shared_cartpole_t0_s0 [train_df] epi: 169 t: 200 wall_t: 360 opt_step: 234560 frame: 23465 fps: 65.1806 total_reward: 200 total_reward_ma: 173.03 loss: 0.0292752 lr: 4.55652e-17 explore_var: nan entropy_coef: 0.001 entropy: 0.112986 grad_norm: nan
[2019-07-14 22:53:10,775 PID:73583 INFO __init__.py log_summary] Trial 0 session 0 ppo_rnn_shared_cartpole_t0_s0 [train_df] epi: 170 t: 185 wall_t: 363 opt_step: 236480 frame: 23650 fps: 65.1515 total_reward: 185 total_reward_ma: 173.2 loss: 0.679745 lr: 4.55652e-17 explore_var: nan entropy_coef: 0.001 entropy: 0.228988 grad_norm: nan
[2019-07-14 22:53:14,093 PID:73583 INFO __init__.py log_summary] Trial 0 session 0 ppo_rnn_shared_cartpole_t0_s0 [train_df] epi: 171 t: 200 wall_t: 367 opt_step: 238400 frame: 23850 fps: 64.9864 total_reward: 200 total_reward_ma: 173.35 loss: 0.624804 lr: 4.55652e-17 explore_var: nan entropy_coef: 0.001 entropy: 0.315934 grad_norm: nan We have not thoroughly tested RNNs yet, but your observation is true and the This will take some time to implement, but we're currently busy with benchmarking tasks. I'm making this issue as a feature request so we can get on it as soon as we have time. |
In the meantime, you could try increasing the sequence length ( |
Are you requesting a feature or an implementation?
To handle the partial MDP task, the recurrent policy is currently quite popular. We need to add a lstm layer after the original conv (or mlp) policy, and store the hidden states for training. But in SLM-lab, the RecurrentNet class has limited ablities. It is more like a concatenation of series of input states, and the hidden states of rnn are not stored, which weanken the recurrent policy seriously.
For example, I used it with the default parameters to solve the cartpole task, and it failed.
Even I changed the max_frame parameter of the env from 500 to 50000, the RecurrentNet still couldn't work.
If you have any suggested solutions
I'm afraid to cause more bugs, so I'm sorry not able to add this new feature. But I provide two examples.
OpenAI baselines
pytorch-a2c-ppo-acktr-gail
With this feature, I believe SLM-Lab will be the top-1 in pytorch.
Thanks in advance!
The text was updated successfully, but these errors were encountered: