MSMARCO eval split in MTEB English (classic) benchmark #1608

aashka-trivedi · 2024-12-17T15:27:49Z

Using the benchmarks object to evaluate the classic MTEB benchmark (MTEB(eng, classic)) uses the test split for all datasets. For MS-MARCO, the paper states that the dev set is to be used instead, leading to mismatch in previously reported leaderboard numbers and scores calculated using the instructions in the docs:

tasks = mteb.get_benchmark("MTEB(eng, classic)") # or use a specific benchmark

evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="results")

The text was updated successfully, but these errors were encountered:

isaac-chung · 2024-12-18T16:52:00Z

@aashka-trivedi thanks for bringing this up! In a previous version where we still have run_mteb_english.py, the implementation was correct ('test' for every task except for MSMARCO). We certainly welcome a PR to fix this issue :)

CC @KennethEnevoldsen

isaac-chung added the bug Something isn't working label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSMARCO eval split in MTEB English (classic) benchmark #1608

MSMARCO eval split in MTEB English (classic) benchmark #1608

aashka-trivedi commented Dec 17, 2024

isaac-chung commented Dec 18, 2024

MSMARCO eval split in MTEB English (classic) benchmark #1608

MSMARCO eval split in MTEB English (classic) benchmark #1608

Comments

aashka-trivedi commented Dec 17, 2024

isaac-chung commented Dec 18, 2024