Release 0.3.9 · open-compass/opencompass

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.9!

🌟 Highlights
✨ This version introduces a number of new features and improvements that enhance the user experience and expand the capabilities of OpenCompass. Notable changes include support for G-Pass@k and LiveMathBench, as well as the introduction of the Bradley-Terry subjective evaluation method.

🚀 New Features
-🆕 Support for G-Pass@k and LiveMathBench metrics to better evaluate model performance. (#1772)
-🆕 Theorem QA 0shot CoT configuration has been added for more comprehensive evaluation scenarios. (#1783)
-🆕 A customizable tokenizer for RULER offers greater flexibility in processing inputs. (#1731)
-🆕 Added LiveStemBench Dataset to enrich our collection of datasets. (#1794)
-🆕 Integration of JudgeLLM into o1 evaluation for improved assessment accuracy. (#1795)
-🆕 Implementation of the Bradley-Terry subjective evaluation method on wildbench, alpacaeval, and compassarena datasets. (#1791)

📖 Documentation
-📚 Updated OC academic content to the most recent information as of December 2024. (#1771)

🐛 Bug Fixes
-🔧 Fixed Order error which was causing issues with sequence handling. (#1767)
-🔧 Resolved an issue where the lark report was returning None. (#1769)
-🔧 Corrected the path for saving Local Runner parameters. (#1768)
-🔧 Amended the summarizer abbreviation for models to ensure proper identification. (#1789)
-🔧 Fixed output_path errors to improve file handling reliability. (#1798)

⚙ Enhancements and Refactors
-💪 Fullbench testcase has been integrated into the CI pipeline. (#1766)
Volc status exception handling has been updated for more robust responses. (#1780)
-💪 Removed daily step retry mechanism and updated PR score calculation for efficiency. (#1782)
-💪Deploy Python version has been updated to the latest stable release. (#1784)
-💪Pypi deploy workflow has been refined for smoother deployments. (#1786)

Thank you for being part of the OpenCompass community! Your support and contributions make each release possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.3.9