You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the benchmarks object to evaluate the classic MTEB benchmark (MTEB(eng, classic)) uses the test split for all datasets. For MS-MARCO, the paper states that the dev set is to be used instead, leading to mismatch in previously reported leaderboard numbers and scores calculated using the instructions in the docs:
tasks = mteb.get_benchmark("MTEB(eng, classic)") # or use a specific benchmark
evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="results")
The text was updated successfully, but these errors were encountered:
@aashka-trivedi thanks for bringing this up! In a previous version where we still have run_mteb_english.py, the implementation was correct ('test' for every task except for MSMARCO). We certainly welcome a PR to fix this issue :)
Using the benchmarks object to evaluate the classic MTEB benchmark (
MTEB(eng, classic)
) uses thetest
split for all datasets. For MS-MARCO, the paper states that thedev
set is to be used instead, leading to mismatch in previously reported leaderboard numbers and scores calculated using the instructions in the docs:The text was updated successfully, but these errors were encountered: