Skip to content

Commit

Permalink
fix: remove * imports (#1569)
Browse files Browse the repository at this point in the history
* fix: Count unique texts, data leaks in calculate metrics (#1438)
* add more stat
* add more stat
* update statistics
* fix: update task metadata to allow for null (#1448)
* Update tasks table
* 1.19.5
Automatically generated by python-semantic-release
* Fix: Made data parsing in the leaderboard figure more robust (#1450)
Bugfixes with data parsing in main figure
* Fixed task loading (#1451)
* Fixed task result loading from disk
* Fixed task result loading from disk
* fix: publish (#1452)
* 1.19.6
Automatically generated by python-semantic-release
* fix: Fix load external results with `None` mteb_version (#1453)
* fix
* lint
* 1.19.7
Automatically generated by python-semantic-release
* WIP: Polishing up leaderboard UI (#1461)
* fix: Removed column wrapping on the table, so that it remains readable
* Added disclaimer to figure
* fix: Added links to task info table, switched out license with metric
* fix: loading pre 1.11.0 (#1460)
* small fix
* fix: fix
* 1.19.8
Automatically generated by python-semantic-release
* fix: swap touche2020 to maintain compatibility (#1469)
swap touche2020 for parity
* 1.19.9
Automatically generated by python-semantic-release
* docs: Add sum per language for task counts (#1468)
* add sum per lang
* add sort by sum option
* make lint
* fix: pinned datasets to <3.0.0 (#1470)
* 1.19.10
Automatically generated by python-semantic-release
* feat: add CUREv1 retrieval dataset (#1459)
* feat: add CUREv1 dataset
---------
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
* feat: add missing domains to medical tasks
* feat: modify benchmark tasks
* chore: benchmark naming
---------
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
* Update tasks table
* 1.20.0
Automatically generated by python-semantic-release
* fix: check if `model` attr of model exists (#1499)
* check if model attr of model exists
* lint
* Fix retrieval evaluator
* 1.20.1
Automatically generated by python-semantic-release
* fix: Leaderboard demo data loading (#1507)
* Made get_scores error tolerant
* Added join_revisions, made get_scores failsafe
* Fetching metadata fixed fr HF models
* Added failsafe metadata fetching to leaderboard code
* Added revision joining to leaderboard app
* fix
* Only show models that have metadata, when filter_models is called
* Ran linting
* 1.20.2
Automatically generated by python-semantic-release
* fix: leaderboard only shows models that have ModelMeta (#1508)
Filtering for models that have metadata
* 1.20.3
Automatically generated by python-semantic-release
* fix: align readme with current mteb (#1493)
* align readme with current mteb
* align with mieb branch
* fix test
* 1.20.4
Automatically generated by python-semantic-release
* docs: Add lang family mapping and map to task table (#1486)
* add lang family mapping and map to task table
* make lint
* add back some unclassified lang codes
* Update tasks table
* fix: Ensure that models match the names on embedding-benchmarks/results (#1519)
* 1.20.5
Automatically generated by python-semantic-release
* fix: Adding missing metadata on models and mathcing names up with the results repo (#1528)
* Added Voyage 3 models
* Added correct metadata to Cohere models and matched names with the results repo
* 1.20.6
Automatically generated by python-semantic-release
* feat: Evaluate missing splits (#1525)
* fix: evaluate missing splits (#1268)
* implement partial evaluation for missing splits
* lint
* requested changes done from scratch
* test for missing split evaluation added
* uncomment test
* lint
* avoid circular import
* use TaskResult
* skip tests for now
---------
Co-authored-by: Isaac Chung <[email protected]>
* got test_all_splits_evaluated passing
* tests passing
* address review comments
* make lint
* handle None cases for kg_co2_emissions
* use new results info
---------
Co-authored-by: Thivyanth <[email protected]>
* 1.21.0
Automatically generated by python-semantic-release
* fix: Correct typos superseeded -> superseded (#1532)
fix typo -> superseded
* 1.21.1
Automatically generated by python-semantic-release
* fix: Task load data error for SICK-BR-STS and XStance (#1534)
* fix task load data for two tasks
* correct dataset keys
* 1.21.2
Automatically generated by python-semantic-release
* fix: Proprietary models now get correctly shown in leaderboard (#1530)
* Fixed showing proprietary models in leaderboard
* Added links to all OpenAI models
* Fixed table formatting issues
* Bumped Gradio version
* 1.21.3
Automatically generated by python-semantic-release
* docs: Add Model Meta parameters and metadata (#1536)
* add multi_qa_MiniLM_L6_cos_v1 model meta
* add all_mpnet_base_v2
* add parameters to model meta
* make lint
* add extra params to meta
* fix: add more model meta (jina, e5) (#1537)
* add e5 model meta
* address review comments
* 1.21.4
Automatically generated by python-semantic-release
* Add cohere models (#1538)
* fix: bug cohere names
* format
* fix: add nomic models (#1543)
#1515
* fix: Added all-minilm-l12-v2 (#1542)
#1515
* fix: Added arctic models (#1541)
#1515
* fix: add sentence trimming to OpenAIWrapper (#1526)
* fix: add sentence trimming to OpenAIWrapper
* fix: import tiktoken library inside encode function
* fix: check tokenizer library installed and update ModelMeta to pass tokenizer_name
* fix: pass tokenizer_name, max_tokens to loader
* fix: make tokenizer_name None for default
* fix: delete changes for ModelMeta
* fix: fix revision to 2 for OpenAI models
* fix: add docstring for OpenAIWrapper
* fix: lint
* feat: add openai optional dependency set
* fix: add sleep for too many requests
* fix: add lint
* fix: delete evaluate file
* 1.21.5
Automatically generated by python-semantic-release
* fix: Fixed metadata errors (#1547)
* 1.21.6
Automatically generated by python-semantic-release
* fix: remove curev1 from multlingual (#1552)
Seems like it was added here:
1cc6c9e
* 1.21.7
Automatically generated by python-semantic-release
* fix: Add Model2vec (#1546)
* Added Model2Vec wrapper
* Added Model2vec models
* Added model2vec models to registry
* Added model2vec as a dependency
* Ran linting
* Update mteb/models/model2vec_models.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Update mteb/models/model2vec_models.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Added adapted_from and superseeded_by to model2vec models.
* Added missing import
* Moved pyproject.toml to optional dependencies
* Fixed typos
* Added import error and changed model to model_name
* Added Numpy to frameworks
* Added Numpy to frameworks
* Corrected false info on model2vec models
* Replaced np.inf with maxint
* Update mteb/models/model2vec_models.py
Co-authored-by: Isaac Chung <[email protected]>
* Added option to have infinite max tokens, added it to Model2vec
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
* Made result loading more permissive, changed eval splits for HotPotQA and DBPedia (#1554)
* Removed train and dev from eval splits on HotpotQA
* Removed dev from eval splits on DBPedia
* Made task_results validation more permissive
* Readded exception in get_score
* Ran linting
* 1.21.8
Automatically generated by python-semantic-release
* docs: Correction of SICK-R metadata (#1558)
* Correction of SICK-R metadata
* Correction of SICK-R metadata
---------
Co-authored-by: rposwiata <[email protected]>
* feat(google_models): fix issues and add support for `text-embedding-005` and `text-multilingual-embedding-002` (#1562)
* fix: google_models batching and prompt
* feat: add text-embedding-005 and text-multilingual-embedding-002
* chore: `make lint` errors
* fix: address PR comments
* 1.22.0
Automatically generated by python-semantic-release
* fix(bm25s): search implementation (#1566)
fix: bm25s implementation
* 1.22.1
Automatically generated by python-semantic-release
* docs: Fix dependency library name for bm25s (#1568)
* fix: bm25s implementation
* correct library name
---------
Co-authored-by: Daniel Buades Marcos <[email protected]>
* fix: Add training dataset to model meta (#1561)
* fix: Add training dataset to model meta
Adresses #1556
* Added docs
* format
* feat: (cohere_models) cohere_task_type issue, batch requests and tqdm for visualization (#1564)
* feat: batch requests to cohere models
* fix: use correct task_type
* feat: use tqdm with openai
* fix: explicitely set `show_progress_bar` to False
* fix(publichealth-qa):  ignore rows with `None` values in `question` or `answer` (#1565)
* 1.23.0
Automatically generated by python-semantic-release
* fix wongnai
* update inits
* fix tests
* lint
* update imports
* fix tests
* lint
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Napuh <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Thivyanth <[email protected]>
Co-authored-by: Youngjoon Jang <[email protected]>
Co-authored-by: Rafał Poświata <[email protected]>
  • Loading branch information
Samoed authored Dec 9, 2024
1 parent dec5d6a commit d0aa3a7
Show file tree
Hide file tree
Showing 207 changed files with 69,186 additions and 819 deletions.
31 changes: 27 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,8 @@ from sentence_transformers import SentenceTransformer

# Define the sentence-transformers model name
model_name = "average_word_embeddings_komninos"
# or directly from huggingface:
# model_name = "sentence-transformers/all-MiniLM-L6-v2"

model = SentenceTransformer(model_name)
model = mteb.get_model(model_name) # if the model is not implemented in MTEB it will be eq. to SentenceTransformer(model_name)
tasks = mteb.get_tasks(tasks=["Banking77Classification"])
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"results/{model_name}")
Expand Down Expand Up @@ -221,7 +219,10 @@ Note that the public leaderboard uses the test splits for all datasets except MS
Models should implement the following interface, implementing an `encode` function taking as inputs a list of sentences, and returning a list of embeddings (embeddings can be `np.array`, `torch.tensor`, etc.). For inspiration, you can look at the [mteb/mtebscripts repo](https://github.com/embeddings-benchmark/mtebscripts) used for running diverse models via SLURM scripts for the paper.

```python
import mteb
from mteb.encoder_interface import PromptType
import numpy as np


class CustomModel:
def encode(
Expand All @@ -245,7 +246,7 @@ class CustomModel:
pass

model = CustomModel()
tasks = mteb.get_task("Banking77Classification")
tasks = mteb.get_tasks(tasks=["Banking77Classification"])
evaluation = MTEB(tasks=tasks)
evaluation.run(model)
```
Expand Down Expand Up @@ -379,6 +380,28 @@ results = mteb.load_results(models=models, tasks=tasks)
df = results_to_dataframe(results)
```

</details>


<details>
<summary> Annotate Contamination in the training data of a model </summary>

### Annotate Contamination

have your found contamination in the training data of a model? Please let us know, either by opening an issue or ideally by submitting a PR
annotatig the training datasets of the model:

```py
model_w_contamination = ModelMeta(
name = "model-with-contamination"
...
training_datasets: {"ArguAna": # name of dataset within MTEB
["test"]} # the splits that have been trained on
...
)
```


</details>

<details>
Expand Down
22 changes: 17 additions & 5 deletions docs/create_tasks_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

import mteb
from mteb.abstasks.TaskMetadata import PROGRAMMING_LANGS, TASK_TYPE
from mteb.languages import ISO_TO_FAM_LEVEL0, ISO_TO_LANGUAGE


def author_from_bibtex(bibtex: str | None) -> str:
Expand Down Expand Up @@ -82,10 +83,21 @@ def create_task_lang_table(tasks: list[mteb.AbsTask], sort_by_sum=False) -> str:
## Wrangle for polars
pl_table_dict = []
for lang, d in table_dict.items():
d.update({"0-lang": lang}) # for sorting columns
d.update({"0-lang-code": lang}) # for sorting columns
pl_table_dict.append(d)

df = pl.DataFrame(pl_table_dict).sort(by="0-lang")
df = pl.DataFrame(pl_table_dict).sort(by="0-lang-code")
df = df.with_columns(
pl.col("0-lang-code")
.replace_strict(ISO_TO_LANGUAGE, default="unknown")
.alias("1-lang-name")
)
df = df.with_columns(
pl.col("0-lang-code")
.replace_strict(ISO_TO_FAM_LEVEL0, default="Unclassified")
.alias("2-lang-fam")
)

df = df.with_columns(sum=pl.sum_horizontal(get_args(TASK_TYPE)))
df = df.select(sorted(df.columns))
if sort_by_sum:
Expand All @@ -96,7 +108,7 @@ def create_task_lang_table(tasks: list[mteb.AbsTask], sort_by_sum=False) -> str:
task_names_md = " | ".join(sorted(get_args(TASK_TYPE)))
horizontal_line_md = "---|---" * (len(sorted(get_args(TASK_TYPE))) + 1)
table = f"""
| Language | {task_names_md} | Sum |
| ISO Code | Language | Family | {task_names_md} | Sum |
|{horizontal_line_md}|
"""

Expand All @@ -119,14 +131,14 @@ def insert_tables(
file_path: str, tables: list[str], tags: list[str] = ["TASKS TABLE"]
) -> None:
"""Insert tables within <!-- TABLE START --> and <!-- TABLE END --> or similar tags."""
md = Path(file_path).read_text()
md = Path(file_path).read_text(encoding="utf-8")

for table, tag in zip(tables, tags):
start = f"<!-- {tag} START -->"
end = f"<!-- {tag} END -->"
md = md.replace(md[md.index(start) + len(start) : md.index(end)], table)

Path(file_path).write_text(md)
Path(file_path).write_text(md, encoding="utf-8")


def main():
Expand Down
16 changes: 13 additions & 3 deletions mteb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,23 @@
MTEB_RETRIEVAL_WITH_INSTRUCTIONS,
CoIR,
)
from mteb.evaluation import *
from mteb.encoder_interface import Encoder
from mteb.evaluation import MTEB
from mteb.load_results import BenchmarkResults, load_results
from mteb.models import get_model, get_model_meta, get_model_metas
from mteb.load_results.task_results import TaskResult
from mteb.models import (
SentenceTransformerWrapper,
get_model,
get_model_meta,
get_model_metas,
)
from mteb.overview import TASKS_REGISTRY, get_task, get_tasks

from .benchmarks.benchmarks import Benchmark
from .benchmarks.get_benchmark import BENCHMARK_REGISTRY, get_benchmark, get_benchmarks

__version__ = version("mteb") # fetch version from install metadata


__all__ = [
"MTEB_ENG_CLASSIC",
"MTEB_MAIN_RU",
Expand All @@ -40,4 +46,8 @@
"get_benchmarks",
"BenchmarkResults",
"BENCHMARK_REGISTRY",
"MTEB",
"TaskResult",
"SentenceTransformerWrapper",
"Encoder",
]
6 changes: 3 additions & 3 deletions mteb/abstasks/AbsTask.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,11 @@ def __init__(self, seed: int = 42, **kwargs: Any):
torch.manual_seed(self.seed)
torch.cuda.manual_seed_all(self.seed)

def check_if_dataset_is_superseeded(self):
"""Check if the dataset is superseeded by a newer version"""
def check_if_dataset_is_superseded(self):
"""Check if the dataset is superseded by a newer version"""
if self.superseded_by:
logger.warning(
f"Dataset '{self.metadata.name}' is superseeded by '{self.superseded_by}', you might consider using the newer version of the dataset."
f"Dataset '{self.metadata.name}' is superseded by '{self.superseded_by}', you might consider using the newer version of the dataset."
)

def dataset_transform(self):
Expand Down
1 change: 1 addition & 0 deletions mteb/abstasks/TaskMetadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@
"cc0-1.0",
"bsd-3-clause",
"gpl-3.0",
"lgpl-3.0",
"cdla-sharing-1.0",
"mpl-2.0",
]
Expand Down
44 changes: 31 additions & 13 deletions mteb/abstasks/__init__.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,33 @@
from __future__ import annotations

from ..evaluation.LangMapping import *
from .AbsTask import *
from .AbsTaskBitextMining import *
from .AbsTaskClassification import *
from .AbsTaskClustering import *
from .AbsTaskMultilabelClassification import *
from .AbsTaskPairClassification import *
from .AbsTaskReranking import *
from .AbsTaskRetrieval import *
from .AbsTaskSpeedTask import *
from .AbsTaskSTS import *
from .AbsTaskSummarization import *
from .MultilingualTask import *
from .AbsTask import AbsTask
from .AbsTaskBitextMining import AbsTaskBitextMining
from .AbsTaskClassification import AbsTaskClassification
from .AbsTaskClustering import AbsTaskClustering
from .AbsTaskClusteringFast import AbsTaskClusteringFast
from .AbsTaskMultilabelClassification import AbsTaskMultilabelClassification
from .AbsTaskPairClassification import AbsTaskPairClassification
from .AbsTaskReranking import AbsTaskReranking
from .AbsTaskRetrieval import AbsTaskRetrieval
from .AbsTaskSpeedTask import AbsTaskSpeedTask
from .AbsTaskSTS import AbsTaskSTS
from .AbsTaskSummarization import AbsTaskSummarization
from .MultilingualTask import MultilingualTask
from .TaskMetadata import TaskMetadata

__all__ = [
"AbsTask",
"AbsTaskBitextMining",
"AbsTaskClassification",
"AbsTaskClustering",
"AbsTaskClusteringFast",
"AbsTaskMultilabelClassification",
"AbsTaskPairClassification",
"AbsTaskReranking",
"AbsTaskRetrieval",
"AbsTaskSpeedTask",
"AbsTaskSTS",
"AbsTaskSummarization",
"MultilingualTask",
"TaskMetadata",
]
57 changes: 55 additions & 2 deletions mteb/benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,57 @@
from __future__ import annotations

from mteb.benchmarks.benchmarks import *
from mteb.benchmarks.get_benchmark import *
from mteb.benchmarks.benchmarks import (
BRIGHT,
LONG_EMBED,
MTEB_DEU,
MTEB_EN,
MTEB_ENG_CLASSIC,
MTEB_EU,
MTEB_FRA,
MTEB_INDIC,
MTEB_JPN,
MTEB_KOR,
MTEB_MAIN_RU,
MTEB_MINERS_BITEXT_MINING,
MTEB_POL,
MTEB_RETRIEVAL_LAW,
MTEB_RETRIEVAL_MEDICAL,
MTEB_RETRIEVAL_WITH_INSTRUCTIONS,
SEB,
Benchmark,
CoIR,
MTEB_code,
MTEB_multilingual,
)
from mteb.benchmarks.get_benchmark import (
BENCHMARK_REGISTRY,
get_benchmark,
get_benchmarks,
)

__all__ = [
"Benchmark",
"MTEB_EN",
"MTEB_ENG_CLASSIC",
"MTEB_MAIN_RU",
"MTEB_RETRIEVAL_WITH_INSTRUCTIONS",
"MTEB_RETRIEVAL_LAW",
"MTEB_RETRIEVAL_MEDICAL",
"MTEB_MINERS_BITEXT_MINING",
"SEB",
"CoIR",
"MTEB_FRA",
"MTEB_DEU",
"MTEB_KOR",
"MTEB_POL",
"MTEB_code",
"MTEB_multilingual",
"MTEB_JPN",
"MTEB_INDIC",
"MTEB_EU",
"LONG_EMBED",
"BRIGHT",
"BENCHMARK_REGISTRY",
"get_benchmarks",
"get_benchmark",
]
44 changes: 44 additions & 0 deletions mteb/descriptive_stats/Classification/Ddisco.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"test": {
"num_samples": 201,
"number_of_characters": 200062,
"number_texts_intersect_with_train": 1,
"min_text_length": 529,
"average_text_length": 995.3333333333334,
"max_text_length": 2050,
"unique_text": 201,
"unique_labels": 3,
"labels": {
"2": {
"count": 76
},
"3": {
"count": 115
},
"1": {
"count": 10
}
}
},
"train": {
"num_samples": 801,
"number_of_characters": 779241,
"number_texts_intersect_with_train": null,
"min_text_length": 492,
"average_text_length": 972.8352059925094,
"max_text_length": 2411,
"unique_text": 796,
"unique_labels": 3,
"labels": {
"1": {
"count": 30
},
"2": {
"count": 325
},
"3": {
"count": 446
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"test": {
"num_samples": 1200,
"number_of_characters": 141679,
"number_texts_intersect_with_train": 0,
"min_text_length": 25,
"average_text_length": 118.06583333333333,
"max_text_length": 566,
"unique_text": 1200,
"unique_labels": 2,
"labels": {
"1": {
"count": 600
},
"0": {
"count": 600
}
}
},
"train": {
"num_samples": 330,
"number_of_characters": 37706,
"number_texts_intersect_with_train": null,
"min_text_length": 19,
"average_text_length": 114.26060606060607,
"max_text_length": 315,
"unique_text": 330,
"unique_labels": 2,
"labels": {
"1": {
"count": 165
},
"0": {
"count": 165
}
}
}
}
Loading

0 comments on commit d0aa3a7

Please sign in to comment.