Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] A question about the cost function of the p3o algorithm #358

Open
2 of 3 tasks
Liqinyan821 opened this issue Nov 15, 2024 · 1 comment
Open
2 of 3 tasks
Labels
question Further information is requested

Comments

@Liqinyan821
Copy link

Liqinyan821 commented Nov 15, 2024

Required prerequisites

Questions

Hello Omnisafe team, thank you very much for your contribution.
When I was Learning the p3o algorithm, I found that the def _loss_pi_cost function was not clip, and loss_pi_cost in the P3O Optimization for Safe Reinforcement Learning used clip.
87bf7a541d27ee53fc4f1bcdfa47bd81
324c76d3af56db011f799976ca22c297

@Liqinyan821 Liqinyan821 added the question Further information is requested label Nov 15, 2024
@Gaiejj
Copy link
Member

Gaiejj commented Nov 25, 2024

You must be a very meticulous person! In fact, this is a trick we discovered while debugging the algorithm, which makes P3O more suitable for high-dimensional complex environments. Have you tried removing the clip? Do you have any experimental data? If it performs well without it, we will modify this implementation later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants