You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you explain your sub-policy model?In your thesis you said trained one PPO actor-critic network for each of the following maneuvers:follow the road,turn left, turn right,but in your code,I just find one PPO model to train for all maneuvers!!
The text was updated successfully, but these errors were encountered:
Yes, the sub-policy model is available under the 'sub-policy' branch (https://github.com/bitsauce/Carla-ppo/tree/sub-policy). Note that it doesn't actually create three instances of the PPO class, but instead, the PPO class itself has three PPO networks inside it that switches based on the maneuver.
The main motivation behind the sub-policy model is twofold: (1) by off-loading some learning onto different networks we are simplifying what each network needs to learn, making the model converge faster (hopefully and in theory) (2) since our goal is to drive along an arbitrary path in some environment (e.g. a path given by a navigation system such as a GPS,) we need a way to condition our network to take certain actions on certain parts of the road. This is what Codevilla et.al. does in their work in their paper End-to-end Driving via Conditional Imitation Learning which is where this sub-policy idea originates from.
Could you explain your sub-policy model?In your thesis you said trained one PPO actor-critic network for each of the following maneuvers:follow the road,turn left, turn right,but in your code,I just find one PPO model to train for all maneuvers!!
The text was updated successfully, but these errors were encountered: