Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double constraints for updating actor? #109

Open
KhoiDOO opened this issue Mar 7, 2023 · 0 comments
Open

Double constraints for updating actor? #109

KhoiDOO opened this issue Mar 7, 2023 · 0 comments

Comments

@KhoiDOO
Copy link

KhoiDOO commented Mar 7, 2023

Hi ShangtongZhang,

I'm confused about the adaptive KL divergence that you used in your code in order to update the actor model (two separate actor and critic model case). In your code, you use both object clip and the adaptive approx-kl, and if the $\text{approx-kl} \le 1.5 \times \text{target-kl}$, the actor model is updated. After reading the PPO, I saw that adaptive KL should belong to TRPO instead cause TRPO has a constraint at Equation 4. Since along with the two constraints including both clip and adaptive KL actor finds it very hard to be updated.

To my viewpoint, I think you are using CLIP and TRPO $L^{KLPEN}$ at the same time, and the $L^{KLPEN}$ should be constructed as

surr = ratio * advantage

if klaffter <= 0.66 * target_kl:
   kl_coef /= 2
elif klafter > 1.5 * target_kl:
   kl_coef *= 2
else:
   print("KL is close enough")

actor_loss = surr - kl_coef * klafter

# Backwarding the actor loss ...

image

After calculating the KL coefficient $\beta$, it's used for calculating the loss and the gradient in Equation 8

image

And, only $L^{KLPEN}$ or $L^{CLIP}$ is used in training

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant