-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sub-policy question #12
Comments
Hey Wei! Yes, the sub-policy model is available under the 'sub-policy' branch (https://github.com/bitsauce/Carla-ppo/tree/sub-policy). Note that it doesn't actually create three instances of the PPO class, but instead, the PPO class itself has three PPO networks inside it that switches based on the maneuver. The main motivation behind the sub-policy model is twofold: (1) by off-loading some learning onto different networks we are simplifying what each network needs to learn, making the model converge faster (hopefully and in theory) (2) since our goal is to drive along an arbitrary path in some environment (e.g. a path given by a navigation system such as a GPS,) we need a way to condition our network to take certain actions on certain parts of the road. This is what Codevilla et.al. does in their work in their paper End-to-end Driving via Conditional Imitation Learning which is where this sub-policy idea originates from. Best regards, |
@bitsauce Thank you very much, you are really kind! |
Could you explain your sub-policy model?In your thesis you said trained one PPO actor-critic network for each of the following maneuvers:follow the road,turn left, turn right,but in your code,I just find one PPO model to train for all maneuvers!!
The text was updated successfully, but these errors were encountered: