No minibatch for computation of logp_old in PPOPolicy #1164

jvasso · 2024-07-01T14:41:15Z

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- design request (i.e. "X should be changed to Y.")
I have visited the source website
I have searched through the issue tracker for duplicates
I have mentioned version numbers, operating system and environment, where applicable:

I have noticed that in the implementation of the PPOPolicy, the computation of the old log probabilities logp_old is performed without using minibatch:

with torch.no_grad():
   batch.logp_old = self(batch).dist.log_prob(batch.act)

This makes this algorithm unusable in situations where the batch is too large, with no possibility of controlling it via batch_size.
I simply suggest to add support for minibatch:

logp_old = []
with torch.no_grad():
    for minibatch in batch.split(self._batch, shuffle=False, merge_last=True):
        logp_old.append(self(minibatch).dist.log_prob(minibatch.act))
    batch.logp_old = torch.cat(logp_old, dim=0).flatten()

The version of Tianshou that I'm using is 1.0.0.

The text was updated successfully, but these errors were encountered:

MischaPanch · 2024-07-06T09:41:55Z

You're right, wanna make a PR for that? Otherwise I can also make one myself

Closes #1164 In PPOPolicy, the method `process_fn()` now computes `logp_old` in minibatch instead of all at once. --------- Co-authored-by: Michael Panchenko <[email protected]>

MischaPanch added the performance issues Slow execution or poor-quality results label Jul 6, 2024

jvasso mentioned this issue Jul 6, 2024

Added support for minibatch in PPO process_fn #1168

Merged

MischaPanch closed this as completed in #1168 Jul 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No minibatch for computation of logp_old in PPOPolicy #1164

No minibatch for computation of logp_old in PPOPolicy #1164

jvasso commented Jul 1, 2024 •

edited

Loading

MischaPanch commented Jul 6, 2024

No minibatch for computation of logp_old in PPOPolicy #1164

No minibatch for computation of logp_old in PPOPolicy #1164

Comments

jvasso commented Jul 1, 2024 • edited Loading

MischaPanch commented Jul 6, 2024

jvasso commented Jul 1, 2024 •

edited

Loading