subpolicy question, #15

EthanCodesss · 2021-02-01T06:02:03Z

Hi！ First,In ppo.py
self.policy = self.loss = -self.policy_loss + self.value_loss - self.entropy_loss
you said ' Reduce sum over all sub-policies (where only the active sub-policy will be non-zero due to previous filtering',but the loss will be a list. How can a list of loss background?
self.train_step = self.optimizer.minimize(self.loss, var_list=policy_params).

Second,you first compute '_create_sub_policy' ,in this part the loss will be reduce mean and finally became a scalar. After filtering，all sub policy module will output the same value. It really work?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subpolicy question, #15

subpolicy question, #15

EthanCodesss commented Feb 1, 2021

subpolicy question, #15

subpolicy question, #15

Comments

EthanCodesss commented Feb 1, 2021