0.3.7
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.7!
🚀 New Features
- 🆕 Added support for new error code handling, improving system resilience. (#1702)
- 🆕 Introduced the P-MMEval feature for advanced model evaluations. (#1714)
- 🆕 Added LiveMathBench support for dynamic mathematical benchmarking. (#1727)
- 🆕 Included the Openai Simpleqa dataset to expand our question answering capabilities. (#1720)
📖 Documentation
- 📚 Updated configurations and documentation to reflect the latest changes, ensuring a smooth user experience. (#1704, #1717)
🐛 Bug Fixes
- 🔧 Resolved issues in output sequence generation under Turbomind model. (#1707)
- 🔧 Corrected configuration errors in pmmeval_gen to ensure proper functionality. (#1719)
⚙ Enhancements and Refactors
- 💪 Enhanced support for Arc Prize Public Evaluation. (#1690)
- 💪 Increased max_out_len parameters for various datasets to accommodate longer sequences. (#1726)
- 💪 Incorporated Korbench and updated Fullbench to provide more comprehensive benchmarking options. (#1713, #1712)
- 💪 Streamlined CI pipeline by updating the torch version and adding more datasets into daily test cases. (#1701)
🎉 Welcome New Contributors
- 👏 A warm welcome to @epsilondylan and @wanyu2018umac, who have made their first contributions by adding the Korbench dataset and introducing the P-MMEval feature respectively! (#1713, #1714)
For a complete overview of all changes, please refer to the full changelog: 0.3.6...0.3.7