Skip to content

0.3.9

Latest
Compare
Choose a tag to compare
@MaiziXiao MaiziXiao released this 31 Dec 09:28
· 2 commits to main since this release
f322043

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.9!

🌟 Highlights
✨ This version introduces a number of new features and improvements that enhance the user experience and expand the capabilities of OpenCompass. Notable changes include support for G-Pass@k and LiveMathBench, as well as the introduction of the Bradley-Terry subjective evaluation method.

πŸš€ New Features
-πŸ†• Support for G-Pass@k and LiveMathBench metrics to better evaluate model performance. (#1772)
-πŸ†• Theorem QA 0shot CoT configuration has been added for more comprehensive evaluation scenarios. (#1783)
-πŸ†• A customizable tokenizer for RULER offers greater flexibility in processing inputs. (#1731)
-πŸ†• Added LiveStemBench Dataset to enrich our collection of datasets. (#1794)
-πŸ†• Integration of JudgeLLM into o1 evaluation for improved assessment accuracy. (#1795)
-πŸ†• Implementation of the Bradley-Terry subjective evaluation method on wildbench, alpacaeval, and compassarena datasets. (#1791)

πŸ“– Documentation
-πŸ“š Updated OC academic content to the most recent information as of December 2024. (#1771)

πŸ› Bug Fixes
-πŸ”§ Fixed Order error which was causing issues with sequence handling. (#1767)
-πŸ”§ Resolved an issue where the lark report was returning None. (#1769)
-πŸ”§ Corrected the path for saving Local Runner parameters. (#1768)
-πŸ”§ Amended the summarizer abbreviation for models to ensure proper identification. (#1789)
-πŸ”§ Fixed output_path errors to improve file handling reliability. (#1798)

βš™ Enhancements and Refactors
-πŸ’ͺ Fullbench testcase has been integrated into the CI pipeline. (#1766)
Volc status exception handling has been updated for more robust responses. (#1780)
-πŸ’ͺ Removed daily step retry mechanism and updated PR score calculation for efficiency. (#1782)
-πŸ’ͺDeploy Python version has been updated to the latest stable release. (#1784)
-πŸ’ͺPypi deploy workflow has been refined for smoother deployments. (#1786)

Thank you for being part of the OpenCompass community! Your support and contributions make each release possible.