sub-policy question #12

weizhaoji · 2020-07-24T07:51:32Z

Could you explain your sub-policy model？In your thesis you said trained one PPO actor-critic network for each of the following maneuvers:follow the road，turn left， turn right，but in your code，I just find one PPO model to train for all maneuvers！！

bitsauce · 2020-07-24T14:14:57Z

Hey Wei!

Yes, the sub-policy model is available under the 'sub-policy' branch (https://github.com/bitsauce/Carla-ppo/tree/sub-policy). Note that it doesn't actually create three instances of the PPO class, but instead, the PPO class itself has three PPO networks inside it that switches based on the maneuver.

The main motivation behind the sub-policy model is twofold: (1) by off-loading some learning onto different networks we are simplifying what each network needs to learn, making the model converge faster (hopefully and in theory) (2) since our goal is to drive along an arbitrary path in some environment (e.g. a path given by a navigation system such as a GPS,) we need a way to condition our network to take certain actions on certain parts of the road. This is what Codevilla et.al. does in their work in their paper End-to-end Driving via Conditional Imitation Learning which is where this sub-policy idea originates from.

Best regards,
Marcus

weizhaoji · 2020-07-25T03:02:19Z

@bitsauce Thank you very much, you are really kind!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sub-policy question #12

sub-policy question #12

weizhaoji commented Jul 24, 2020

bitsauce commented Jul 24, 2020

weizhaoji commented Jul 25, 2020

sub-policy question #12

sub-policy question #12

Comments

weizhaoji commented Jul 24, 2020

bitsauce commented Jul 24, 2020

weizhaoji commented Jul 25, 2020