Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for minibatch in PPO process_fn #1168

Merged
merged 2 commits into from
Jul 20, 2024

Conversation

jvasso
Copy link
Contributor

@jvasso jvasso commented Jul 6, 2024

Closes #1164

In PPOPolicy, the method process_fn() now computes logp_old in minibatch instead of all at once.

@MischaPanch
Copy link
Collaborator

This looks innocent, but tests fail. I'll have a look at what goes wrong and will finalize this.

Sorry for the late reaction, my official work has moved me in another direction for a few months, but now I'm back to devoting time to tianshou :)

@MischaPanch
Copy link
Collaborator

Looks like just a typo, _batch -> batch

@MischaPanch
Copy link
Collaborator

Ok, I had a closer look. The current handling is a bit confusing, there is self.max_batchsize which is used implicitly through a method defined in super class. So the fix is to split according to this param, pushing the fix now

@MischaPanch MischaPanch enabled auto-merge (squash) July 20, 2024 08:03
@MischaPanch MischaPanch merged commit 324e3d2 into thu-ml:master Jul 20, 2024
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No minibatch for computation of logp_old in PPOPolicy
3 participants