Skip to content

Releases: allenai/reward-bench

v0.1.3 -- Tons of CLI logging improvements!

04 Oct 23:34
c8f3fd1
Compare
Choose a tag to compare

rewardbench CLI can be run on any instruction dataset with fancy logging of scores.
This makes it so rewardbench can be used to quickly throw together a rejection sampling pipeline once give generations.

Specifically, I think this type of logging is really great for evaluation. It’s something wandb does for training, but when using the CLI, you pass one arg that will save:

  • All the scores, input text, etc to HuggingFace
  • The command used to launch the eval
  • The current python env for reproducibility

Examples are in the readme: https://github.com/allenai/reward-bench?tab=readme-ov-file#logging

What's Changed

New Contributors

Full Changelog: v0.1.2...v0.1.3