You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
56,72375Some results seem to not agree with each other between the old and new leaderboard e.g. the ranking and scores for the Law tab look quite different
Here is an example:
For gritlm, multilingual-e5-large-instruct, multilingual-e5-base at least the models (generally) agree:
56,72375Some results seem to not agree with each other between the old and new leaderboard e.g. the ranking and scores for the Law tab look quite different
Here is an example:
For gritlm, multilingual-e5-large-instruct, multilingual-e5-base at least the models (generally) agree:
The tasks seem to match between v1 and v2.
Seems like scores on both benchmarks are:
@x-tabdeveloping is there an issue with rounding here? it should be 56.72 not 56.73 (minor though)
Additionally not sure why the mean retrieval is not the same as the mean(task)?
originally posted to #1317
The text was updated successfully, but these errors were encountered: