Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSMARCO eval split in MTEB English (classic) benchmark #1608

Open
aashka-trivedi opened this issue Dec 17, 2024 · 1 comment
Open

MSMARCO eval split in MTEB English (classic) benchmark #1608

aashka-trivedi opened this issue Dec 17, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@aashka-trivedi
Copy link

Using the benchmarks object to evaluate the classic MTEB benchmark (MTEB(eng, classic)) uses the test split for all datasets. For MS-MARCO, the paper states that the dev set is to be used instead, leading to mismatch in previously reported leaderboard numbers and scores calculated using the instructions in the docs:

tasks = mteb.get_benchmark("MTEB(eng, classic)") # or use a specific benchmark

evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="results")
@isaac-chung
Copy link
Collaborator

@aashka-trivedi thanks for bringing this up! In a previous version where we still have run_mteb_english.py, the implementation was correct ('test' for every task except for MSMARCO). We certainly welcome a PR to fix this issue :)

CC @KennethEnevoldsen

@isaac-chung isaac-chung added the bug Something isn't working label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants