Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leaderboard 2.0: Add missing models #1515

Open
KennethEnevoldsen opened this issue Nov 27, 2024 · 15 comments
Open

leaderboard 2.0: Add missing models #1515

KennethEnevoldsen opened this issue Nov 27, 2024 · 15 comments
Labels
leaderboard issues related to the leaderboard

Comments

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Nov 27, 2024

Planned but still quite a few models missing; would be great to have them all! (e.g. only 42 models for MTEB classic vs 337 in the current leaderboard)

@x-tabdeveloping and I am currently working on resolving these by adding the metadata for the missing models.

related to #1317

@isaac-chung
Copy link
Collaborator

What's the current diff / where to find that? Would love to help out.

@x-tabdeveloping
Copy link
Collaborator

Here's a list of models, for which we have results of some sorts, but do not occur in the metadata:

{'Alibaba-NLP/gte-Qwen1.5-7B-instruct',
 'Alibaba-NLP/gte-Qwen2-1.5B-instruct',
 'Alibaba-NLP/gte-base-en-v1.5',
 'Alibaba-NLP/gte-large-en-v1.5',
 'Alibaba-NLP/gte-multilingual-base',
 'BAAI/bge-en-icl',
 'BAAI/bge-m3',
 'BAAI/bge-multilingual-gemma2',
 'BeastyZ/e5-R-mistral-7b',
 'Cohere/Cohere-embed-english-light-v3.0',
 'Cohere/Cohere-embed-english-v3.0',
 'Cohere/Cohere-embed-multilingual-light-v3.0',
 'Cohere/Cohere-embed-multilingual-v3.0',
 'Gameselo/STS-multilingual-mpnet-base-v2',
 'HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1',
 'HIT-TMG/KaLM-embedding-multilingual-mini-v1',
 'Haon-Chen/speed-embedding-7b-instruct',
 'Hum-Works/lodestone-base-4096-v1',
 'Jaume/gemma-2b-embeddings',
 'Labib11/MUG-B-1.6',
 'Lajavaness/bilingual-embedding-base',
 'Lajavaness/bilingual-embedding-large',
 'Lajavaness/bilingual-embedding-small',
 'Linq-AI-Research/Linq-Embed-Mistral',
 'Mihaiii/Bulbasaur',
 'Mihaiii/Ivysaur',
 'Mihaiii/Squirtle',
 'Mihaiii/Venusaur',
 'Mihaiii/Wartortle',
 'Mihaiii/gte-micro',
 'Mihaiii/gte-micro-v4',
 'Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit',
 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit',
 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit',
 'Muennighoff/SGPT-2.7B-weightedmean-msmarco-specb-bitfit',
 'Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit',
 'Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit',
 'Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka',
 'Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet',
 'Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshka',
 'Omartificial-Intelligence-Space/Arabic-labse-Matryoshka',
 'Omartificial-Intelligence-Space/Arabic-mpnet-base-all-nli-triplet',
 'Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka',
 'OrdalieTech/Solon-embeddings-large-0.1',
 'OrlikB/KartonBERT-USE-base-v1',
 'OrlikB/st-polish-kartonberta-base-alpha-v1',
 'Salesforce/SFR-Embedding-Mistral',
 'Snowflake/snowflake-arctic-embed-l',
 'Snowflake/snowflake-arctic-embed-m',
 'Snowflake/snowflake-arctic-embed-m-long',
 'Snowflake/snowflake-arctic-embed-m-v1.5',
 'Snowflake/snowflake-arctic-embed-s',
 'Snowflake/snowflake-arctic-embed-xs',
 'aari1995/German_Semantic_STS_V2',
 'abhinand/MedEmbed-small-v0.1',
 'amazon/Titan-text-embeddings-v2',
 'andersonbcdefg/bge-small-4096',
 'avsolatorio/GIST-Embedding-v0',
 'avsolatorio/GIST-all-MiniLM-L6-v2',
 'avsolatorio/GIST-large-Embedding-v0',
 'avsolatorio/GIST-small-Embedding-v0',
 'avsolatorio/NoInstruct-small-Embedding-v0',
 'bigscience/sgpt-bloom-7b1-msmarco',
 'biswa921/bge-m3',
 'brahmairesearch/slx-v0.1',
 'consciousAI/cai-lunaris-text-embeddings',
 'consciousAI/cai-stellaris-text-embeddings',
 'deepfile/embedder-100p',
 'deepvk/USER-bge-m3',
 'djovak/multi-qa-MiniLM-L6-cos-v1',
 'dumyy/sft-bge-small',
 'dwzhu/e5-base-4k',
 'facebook/SONAR',
 'infgrad/stella-base-en-v2',
 'intfloat/e5-base',
 'intfloat/e5-large',
 'izhx/udever-bloom-1b1',
 'izhx/udever-bloom-3b',
 'izhx/udever-bloom-560m',
 'izhx/udever-bloom-7b1',
 'jamesgpt1/sf_model_e5',
 'jinaai/jina-embedding-b-en-v1',
 'jinaai/jina-embedding-l-en-v1',
 'jinaai/jina-embedding-s-en-v1',
 'jinaai/jina-embeddings-v2-base-de',
 'jinaai/jina-embeddings-v2-base-en',
 'jinaai/jina-embeddings-v2-base-es',
 'jinaai/jina-embeddings-v2-small-en',
 'jxm/cde-small-v1',
 'malenia1/ternary-weight-embedding',
 'manu/bge-m3-custom-fr',
 'manu/sentence_croissant_alpha_v0.2',
 'manu/sentence_croissant_alpha_v0.3',
 'manu/sentence_croissant_alpha_v0.4',
 'minishlab/M2V_base_glove',
 'minishlab/M2V_base_glove_subword',
 'minishlab/M2V_base_output',
 'minishlab/potion-base-2M',
 'minishlab/potion-base-4M',
 'minishlab/potion-base-8M',
 'mixedbread-ai/mxbai-embed-2d-large-v1',
 'mixedbread-ai/mxbai-embed-xsmall-v1',
 'neuralmagic/bge-base-en-v1.5-quant',
 'neuralmagic/bge-base-en-v1.5-sparse',
 'neuralmagic/bge-large-en-v1.5-quant',
 'neuralmagic/bge-large-en-v1.5-sparse',
 'neuralmagic/bge-small-en-v1.5-quant',
 'neuralmagic/bge-small-en-v1.5-sparse',
 'nomic-ai/nomic-embed-text-v1-ablated',
 'nomic-ai/nomic-embed-text-v1-unsupervised',
 'nvidia/NV-Embed-v1',
 'nvidia/NV-Embed-v2',
 'nvidia/NV-Retriever-v1',
 'omarelshehy/arabic-english-sts-matryoshka',
 'openbmb/MiniCPM-Embedding',
 'qinxianliu/FAB-Ramy-v1',
 'qinxianliu/FAE-v1',
 'qinxianliu/FUE-v1',
 'sdadas/mmlw-e5-base',
 'sdadas/mmlw-e5-large',
 'sdadas/mmlw-e5-small',
 'sdadas/mmlw-roberta-base',
 'sdadas/mmlw-roberta-large',
 'sentence-transformers/all-MiniLM-L12-v2',
 'sentence-transformers/all-mpnet-base-v2',
 'shibing624/text2vec-base-multilingual',
 'silma-ai/silma-embeddding-matryoshka-v0.1',
 'tanmaylaud/ret-phi2-v0',
 'thenlper/gte-base',
 'thenlper/gte-large',
 'thenlper/gte-small',
 'thtang/ALL_862873',
 'tsirif/BinGSE-Meta-Llama-3-8B-Instruct',
 'vprelovac/universal-sentence-encoder-4',
 'vprelovac/universal-sentence-encoder-large-5',
 'vprelovac/universal-sentence-encoder-multilingual-3',
 'vprelovac/universal-sentence-encoder-multilingual-large-3',
 'zeroshot/gte-large-quant',
 'zeroshot/gte-large-sparse',
 'zeroshot/gte-small-quant',
 'zeta-alpha-ai/Zeta-Alpha-E5-Mistral'}

I suggest, that we seriously consider whether we would like to have quantized versions of the same models in the new leaderboard.
I would also suggest to remove copies or finetunes of more popular models, where there is no indication how they are different from the original.

@KennethEnevoldsen
Copy link
Contributor Author

KennethEnevoldsen commented Nov 29, 2024

This is my classification of the models:

sorting = {
    "clear keep": [
        # probably all of these should be registrered as well
        "Alibaba-NLP/gte-Qwen1.5-7B-instruct", #PR exists
        "Alibaba-NLP/gte-Qwen2-1.5B-instruct", #PR exists
        "Alibaba-NLP/gte-base-en-v1.5", #PR exists
        "Alibaba-NLP/gte-large-en-v1.5", #PR exists
        "Alibaba-NLP/gte-multilingual-base", #PR exists
        "BAAI/bge-en-icl", #PR exists
        "BAAI/bge-m3", #PR exists
        "BAAI/bge-multilingual-gemma2", #PR exists
        "Linq-AI-Research/Linq-Embed-Mistral", #PR exists
        "Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-125M-weightedmean-nli-bitfit",
        "Muennighoff/SGPT-2.7B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit",
        "jinaai/jina-embedding-b-en-v1",
        "jinaai/jina-embedding-l-en-v1",
        "jinaai/jina-embedding-s-en-v1",
        "jinaai/jina-embeddings-v2-base-de",
        "jinaai/jina-embeddings-v2-base-en",
        "jinaai/jina-embeddings-v2-base-es",
        "jinaai/jina-embeddings-v2-small-en",
        "jxm/cde-small-v1",
        "intfloat/e5-base",
        "intfloat/e5-large",
        "facebook/SONAR",
        "amazon/Titan-text-embeddings-v2",
        "nvidia/NV-Embed-v1",  # some models are versions of eachother (we could include a "superseeded by" column to allow users to filter earlier versions)
        "nvidia/NV-Embed-v2",
        "nvidia/NV-Retriever-v1",
        "mixedbread-ai/mxbai-embed-2d-large-v1",
        "mixedbread-ai/mxbai-embed-xsmall-v1",
        "nomic-ai/nomic-embed-text-v1-ablated",
        "nomic-ai/nomic-embed-text-v1-unsupervised",
        "minishlab/M2V_base_glove", #PR exists
        "minishlab/M2V_base_glove_subword", #PR exists
        "minishlab/M2V_base_output", #PR exists
        "minishlab/potion-base-2M", #PR exists
        "minishlab/potion-base-4M", #PR exists
        "minishlab/potion-base-8M", #PR exists
        "Salesforce/SFR-Embedding-Mistral",
        "Snowflake/snowflake-arctic-embed-l",
        "Snowflake/snowflake-arctic-embed-m",
        "Snowflake/snowflake-arctic-embed-m-long",
        "Snowflake/snowflake-arctic-embed-m-v1.5",
        "Snowflake/snowflake-arctic-embed-s",
        "Snowflake/snowflake-arctic-embed-xs",
        "sentence-transformers/all-MiniLM-L12-v2",
        "sentence-transformers/all-mpnet-base-v2",
    ],
    "keep": [
        "Haon-Chen/speed-embedding-7b-instruct",
        "Gameselo/STS-multilingual-mpnet-base-v2",
        "HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1",
        "HIT-TMG/KaLM-embedding-multilingual-mini-v1",
        "Hum-Works/lodestone-base-4096-v1",
        "Jaume/gemma-2b-embeddings",
        "BeastyZ/e5-R-mistral-7b",
        "Lajavaness/bilingual-embedding-base",
        "Lajavaness/bilingual-embedding-large",
        "Lajavaness/bilingual-embedding-small",
        "Mihaiii/Bulbasaur",
        "Mihaiii/Ivysaur",
        "Mihaiii/Squirtle",
        "Mihaiii/Venusaur",
        "Mihaiii/Wartortle",
        "Mihaiii/gte-micro",
        "Mihaiii/gte-micro-v4",
        "OrdalieTech/Solon-embeddings-large-0.1",
        "Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet",
        "Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-labse-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-mpnet-base-all-nli-triplet",
        "Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka",
        "consciousAI/cai-lunaris-text-embeddings",
        "consciousAI/cai-stellaris-text-embeddings",
        "manu/bge-m3-custom-fr",
        "manu/sentence_croissant_alpha_v0.2",
        "manu/sentence_croissant_alpha_v0.3",
        "manu/sentence_croissant_alpha_v0.4",
        "thenlper/gte-base",
        "thenlper/gte-large",
        "thenlper/gte-small",
        "OrlikB/KartonBERT-USE-base-v1",
        "OrlikB/st-polish-kartonberta-base-alpha-v1",
        "sdadas/mmlw-e5-base",  # some models are monolingual adaptions of a another models (I would include them for now)
        "dwzhu/e5-base-4k",  # e.g. this is a long doc adaption of e5
        "sdadas/mmlw-e5-large",
        "sdadas/mmlw-e5-small",
        "sdadas/mmlw-roberta-base",
        "sdadas/mmlw-roberta-large",
        "izhx/udever-bloom-1b1",
        "izhx/udever-bloom-3b",
        "izhx/udever-bloom-560m",
        "izhx/udever-bloom-7b1",
        "avsolatorio/GIST-Embedding-v0",
        "avsolatorio/GIST-all-MiniLM-L6-v2",
        "avsolatorio/GIST-large-Embedding-v0",
        "avsolatorio/GIST-small-Embedding-v0",
        "bigscience/sgpt-bloom-7b1-msmarco",
        "aari1995/German_Semantic_STS_V2",
        "abhinand/MedEmbed-small-v0.1",
        "avsolatorio/NoInstruct-small-Embedding-v0",
        "brahmairesearch/slx-v0.1",
        "deepfile/embedder-100p",
        "deepvk/USER-bge-m3",
        "infgrad/stella-base-en-v2",
        "malenia1/ternary-weight-embedding",
        "omarelshehy/arabic-english-sts-matryoshka",
        "openbmb/MiniCPM-Embedding",
        "shibing624/text2vec-base-multilingual",
        "silma-ai/silma-embeddding-matryoshka-v0.1",
        "zeta-alpha-ai/Zeta-Alpha-E5-Mistral",
    ],
    "quantizations": [
        # I think we need to think of a good way to include quantizations (potentially let be be toggle-able and by default being toggle-able)
        "zeroshot/gte-large-quant",
        "zeroshot/gte-large-sparse",
        "zeroshot/gte-small-quant",
        "neuralmagic/bge-base-en-v1.5-quant",
        "neuralmagic/bge-base-en-v1.5-sparse",
        "neuralmagic/bge-large-en-v1.5-quant",
        "neuralmagic/bge-large-en-v1.5-sparse",
        "neuralmagic/bge-small-en-v1.5-quant",
        "neuralmagic/bge-small-en-v1.5-sparse",
    ],
    "probably remove": [
        # seems to have been a part of MTEB tests (I don't think we use these anymore)
        "vprelovac/universal-sentence-encoder-4",
        "vprelovac/universal-sentence-encoder-large-5",
        "vprelovac/universal-sentence-encoder-multilingual-3",
        "vprelovac/universal-sentence-encoder-multilingual-large-3",
        # duplicate
        "biswa921/bge-m3",
        # not enough info
        "Labib11/MUG-B-1.6",
        "thtang/ALL_862873",
        "qinxianliu/FAB-Ramy-v1",
        "qinxianliu/FAE-v1",
        "qinxianliu/FUE-v1",
        "dumyy/sft-bge-small",
        "jamesgpt1/sf_model_e5",
        "tsirif/BinGSE-Meta-Llama-3-8B-Instruct",
        "tanmaylaud/ret-phi2-v0",
        "andersonbcdefg/bge-small-4096",
        # this is actually for sentence-transformers/multi-qa-MiniLM-L6-cos-v1, we might just rename?
        # but it is probably better to implement it and rerun it
        "djovak/multi-qa-MiniLM-L6-cos-v1",
    ],
}

I have been fairly conservative on removing models, but i def. think that we should add:

  1. adapted_from to the metadata (to all users to remove quantizations, fine-tunes, long doc extension etc.)
  2. supersedes: e.g. nvidia/NV-Embed-v2 supersedes v1. This allows us to filter earlier versions

@x-tabdeveloping
Copy link
Collaborator

@KennethEnevoldsen I couldn't agree more, sorry for not sending this earlier, but I basically came up with the same list

@isaac-chung
Copy link
Collaborator

Do you need help? If so, which ones could I take and where do I commit the metadata to?

@isaac-chung isaac-chung added the leaderboard issues related to the leaderboard label Nov 30, 2024
@KennethEnevoldsen
Copy link
Contributor Author

@isaac-chung can start with the models that have an easy sentence-transformers implementation.

E.g.
"sentence-transformer/multi-qa-MiniLM-L6-cos-v1"
"sentence-transformers/all-mpnet-base-v2"

My understanding is that @x-tabdeveloping will not add the implementation (i.e. loader), but just the metadata.

@isaac-chung
Copy link
Collaborator

Gotcha. Yes, I can start with those.

@x-tabdeveloping
Copy link
Collaborator

I'd really appreciate some help on this! As far as I know there are already some PRs open with some model metas.
Anything that isn't already in the pipeline is free real estate!

@isaac-chung
Copy link
Collaborator

Happy to help! Wanted to clarify these:

  1. Where are these PRs / should they go, in the results repo?
  2. Where would you like the metadata to be added?
  3. To find whether they are sentence transformers compatible, I guess I can refer to this file?

@x-tabdeveloping
Copy link
Collaborator

  1. Add new models nvidia, gte, linq #1436 Add model jxm/cde-small-v1 #1521 fix: Proprietary models now get correctly shown in leaderboard #1530
  2. In mteb/models/<type_of_model>.py
  3. You can check the HF repo, models compatible with sentence-transformers will usually have a tag that says so, and also have a 1_Pooling folder

@isaac-chung
Copy link
Collaborator

Thanks, @x-tabdeveloping ! This PR is in the right location then 👍 I'll keep going for the s-t compatible ones.

@KennethEnevoldsen
Copy link
Contributor Author

KennethEnevoldsen commented Dec 3, 2024

Updated the list to remove everything which has a PR:

While there is still a few we should probably add. There are quite a few that would be annoying to add manually (e.g. shibing624/text2vec-base-multilingual). I'm unsure if we want to add these automatically (@x-tabdeveloping you mentioned that you had a script for this?)

I plan to completely remove the ones in "probably remove" from the results repo. If someone thinks this is a bad idea, let me know

sorting = {
    "clear keep": [
        "BAAI/bge-en-icl",
        "BAAI/bge-m3",
        "BAAI/bge-multilingual-gemma2",
        "Linq-AI-Research/Linq-Embed-Mistral",
        "Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-125M-weightedmean-nli-bitfit",
        "Muennighoff/SGPT-2.7B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit",
        "facebook/SONAR",
        "amazon/Titan-text-embeddings-v2",
        "Salesforce/SFR-Embedding-Mistral",
    ],
    "keep": [
        "Haon-Chen/speed-embedding-7b-instruct",
        "Gameselo/STS-multilingual-mpnet-base-v2",
        "HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1",
        "HIT-TMG/KaLM-embedding-multilingual-mini-v1",
        "Hum-Works/lodestone-base-4096-v1",
        "Jaume/gemma-2b-embeddings",
        "BeastyZ/e5-R-mistral-7b",
        "Lajavaness/bilingual-embedding-base",
        "Lajavaness/bilingual-embedding-large",
        "Lajavaness/bilingual-embedding-small",
        "Mihaiii/Bulbasaur",
        "Mihaiii/Ivysaur",
        "Mihaiii/Squirtle",
        "Mihaiii/Venusaur",
        "Mihaiii/Wartortle",
        "Mihaiii/gte-micro",
        "Mihaiii/gte-micro-v4",
        "OrdalieTech/Solon-embeddings-large-0.1",
        "Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet",
        "Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-labse-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-mpnet-base-all-nli-triplet",
        "Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka",
        "consciousAI/cai-lunaris-text-embeddings",
        "consciousAI/cai-stellaris-text-embeddings",
        "manu/bge-m3-custom-fr",
        "manu/sentence_croissant_alpha_v0.2",
        "manu/sentence_croissant_alpha_v0.3",
        "manu/sentence_croissant_alpha_v0.4",
        "thenlper/gte-base",
        "thenlper/gte-large",
        "thenlper/gte-small",
        "OrlikB/KartonBERT-USE-base-v1",
        "OrlikB/st-polish-kartonberta-base-alpha-v1",
        "sdadas/mmlw-e5-base",  # some models are monolingual adaptions of a another models (I would include them for now)
        "dwzhu/e5-base-4k",  # e.g. this is a long doc adaption of e5
        "sdadas/mmlw-e5-large",
        "sdadas/mmlw-e5-small",
        "sdadas/mmlw-roberta-base",
        "sdadas/mmlw-roberta-large",
        "izhx/udever-bloom-1b1",
        "izhx/udever-bloom-3b",
        "izhx/udever-bloom-560m",
        "izhx/udever-bloom-7b1",
        "avsolatorio/GIST-Embedding-v0",
        "avsolatorio/GIST-all-MiniLM-L6-v2",
        "avsolatorio/GIST-large-Embedding-v0",
        "avsolatorio/GIST-small-Embedding-v0",
        "bigscience/sgpt-bloom-7b1-msmarco",
        "aari1995/German_Semantic_STS_V2",
        "abhinand/MedEmbed-small-v0.1",
        "avsolatorio/NoInstruct-small-Embedding-v0",
        "brahmairesearch/slx-v0.1",
        "deepfile/embedder-100p",
        "deepvk/USER-bge-m3",
        "infgrad/stella-base-en-v2",
        "malenia1/ternary-weight-embedding",
        "omarelshehy/arabic-english-sts-matryoshka",
        "openbmb/MiniCPM-Embedding",
        "shibing624/text2vec-base-multilingual",
        "silma-ai/silma-embeddding-matryoshka-v0.1",
        "zeta-alpha-ai/Zeta-Alpha-E5-Mistral",
    ],
    "quantizations": [
        # I think we need to think of a good way to include quantizations (potentially let be be toggle-able and by default being toggle-able)
        "zeroshot/gte-large-quant",
        "zeroshot/gte-large-sparse",
        "zeroshot/gte-small-quant",
        "neuralmagic/bge-base-en-v1.5-quant",
        "neuralmagic/bge-base-en-v1.5-sparse",
        "neuralmagic/bge-large-en-v1.5-quant",
        "neuralmagic/bge-large-en-v1.5-sparse",
        "neuralmagic/bge-small-en-v1.5-quant",
        "neuralmagic/bge-small-en-v1.5-sparse",
    ],
    "probably remove": [
        # seems to have been a part of MTEB tests (I don't think we use these anymore)
        "vprelovac/universal-sentence-encoder-4",
        "vprelovac/universal-sentence-encoder-large-5",
        "vprelovac/universal-sentence-encoder-multilingual-3",
        "vprelovac/universal-sentence-encoder-multilingual-large-3",
        # duplicate
        "biswa921/bge-m3",
        # not enough info
        "Labib11/MUG-B-1.6",
        "thtang/ALL_862873",
        "qinxianliu/FAB-Ramy-v1",
        "qinxianliu/FAE-v1",
        "qinxianliu/FUE-v1",
        "dumyy/sft-bge-small",
        "jamesgpt1/sf_model_e5",
        "tsirif/BinGSE-Meta-Llama-3-8B-Instruct",
        "tanmaylaud/ret-phi2-v0",
        "andersonbcdefg/bge-small-4096",
    ],
}

x-tabdeveloping pushed a commit that referenced this issue Dec 4, 2024
@x-tabdeveloping
Copy link
Collaborator

I have a script but it's not particularly good. I can put in a bit more work into that if it's something we want

@x-tabdeveloping
Copy link
Collaborator

I'm on it

@x-tabdeveloping
Copy link
Collaborator

What are we missing still? Is it just this PR? #1436

isaac-chung pushed a commit that referenced this issue Dec 9, 2024
* fix: Count unique texts, data leaks in calculate metrics (#1438)
* add more stat
* add more stat
* update statistics
* fix: update task metadata to allow for null (#1448)
* Update tasks table
* 1.19.5
Automatically generated by python-semantic-release
* Fix: Made data parsing in the leaderboard figure more robust (#1450)
Bugfixes with data parsing in main figure
* Fixed task loading (#1451)
* Fixed task result loading from disk
* Fixed task result loading from disk
* fix: publish (#1452)
* 1.19.6
Automatically generated by python-semantic-release
* fix: Fix load external results with `None` mteb_version (#1453)
* fix
* lint
* 1.19.7
Automatically generated by python-semantic-release
* WIP: Polishing up leaderboard UI (#1461)
* fix: Removed column wrapping on the table, so that it remains readable
* Added disclaimer to figure
* fix: Added links to task info table, switched out license with metric
* fix: loading pre 1.11.0 (#1460)
* small fix
* fix: fix
* 1.19.8
Automatically generated by python-semantic-release
* fix: swap touche2020 to maintain compatibility (#1469)
swap touche2020 for parity
* 1.19.9
Automatically generated by python-semantic-release
* docs: Add sum per language for task counts (#1468)
* add sum per lang
* add sort by sum option
* make lint
* fix: pinned datasets to <3.0.0 (#1470)
* 1.19.10
Automatically generated by python-semantic-release
* feat: add CUREv1 retrieval dataset (#1459)
* feat: add CUREv1 dataset
---------
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
* feat: add missing domains to medical tasks
* feat: modify benchmark tasks
* chore: benchmark naming
---------
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
* Update tasks table
* 1.20.0
Automatically generated by python-semantic-release
* fix: check if `model` attr of model exists (#1499)
* check if model attr of model exists
* lint
* Fix retrieval evaluator
* 1.20.1
Automatically generated by python-semantic-release
* fix: Leaderboard demo data loading (#1507)
* Made get_scores error tolerant
* Added join_revisions, made get_scores failsafe
* Fetching metadata fixed fr HF models
* Added failsafe metadata fetching to leaderboard code
* Added revision joining to leaderboard app
* fix
* Only show models that have metadata, when filter_models is called
* Ran linting
* 1.20.2
Automatically generated by python-semantic-release
* fix: leaderboard only shows models that have ModelMeta (#1508)
Filtering for models that have metadata
* 1.20.3
Automatically generated by python-semantic-release
* fix: align readme with current mteb (#1493)
* align readme with current mteb
* align with mieb branch
* fix test
* 1.20.4
Automatically generated by python-semantic-release
* docs: Add lang family mapping and map to task table (#1486)
* add lang family mapping and map to task table
* make lint
* add back some unclassified lang codes
* Update tasks table
* fix: Ensure that models match the names on embedding-benchmarks/results (#1519)
* 1.20.5
Automatically generated by python-semantic-release
* fix: Adding missing metadata on models and mathcing names up with the results repo (#1528)
* Added Voyage 3 models
* Added correct metadata to Cohere models and matched names with the results repo
* 1.20.6
Automatically generated by python-semantic-release
* feat: Evaluate missing splits (#1525)
* fix: evaluate missing splits (#1268)
* implement partial evaluation for missing splits
* lint
* requested changes done from scratch
* test for missing split evaluation added
* uncomment test
* lint
* avoid circular import
* use TaskResult
* skip tests for now
---------
Co-authored-by: Isaac Chung <[email protected]>
* got test_all_splits_evaluated passing
* tests passing
* address review comments
* make lint
* handle None cases for kg_co2_emissions
* use new results info
---------
Co-authored-by: Thivyanth <[email protected]>
* 1.21.0
Automatically generated by python-semantic-release
* fix: Correct typos superseeded -> superseded (#1532)
fix typo -> superseded
* 1.21.1
Automatically generated by python-semantic-release
* fix: Task load data error for SICK-BR-STS and XStance (#1534)
* fix task load data for two tasks
* correct dataset keys
* 1.21.2
Automatically generated by python-semantic-release
* fix: Proprietary models now get correctly shown in leaderboard (#1530)
* Fixed showing proprietary models in leaderboard
* Added links to all OpenAI models
* Fixed table formatting issues
* Bumped Gradio version
* 1.21.3
Automatically generated by python-semantic-release
* docs: Add Model Meta parameters and metadata (#1536)
* add multi_qa_MiniLM_L6_cos_v1 model meta
* add all_mpnet_base_v2
* add parameters to model meta
* make lint
* add extra params to meta
* fix: add more model meta (jina, e5) (#1537)
* add e5 model meta
* address review comments
* 1.21.4
Automatically generated by python-semantic-release
* Add cohere models (#1538)
* fix: bug cohere names
* format
* fix: add nomic models (#1543)
#1515
* fix: Added all-minilm-l12-v2 (#1542)
#1515
* fix: Added arctic models (#1541)
#1515
* fix: add sentence trimming to OpenAIWrapper (#1526)
* fix: add sentence trimming to OpenAIWrapper
* fix: import tiktoken library inside encode function
* fix: check tokenizer library installed and update ModelMeta to pass tokenizer_name
* fix: pass tokenizer_name, max_tokens to loader
* fix: make tokenizer_name None for default
* fix: delete changes for ModelMeta
* fix: fix revision to 2 for OpenAI models
* fix: add docstring for OpenAIWrapper
* fix: lint
* feat: add openai optional dependency set
* fix: add sleep for too many requests
* fix: add lint
* fix: delete evaluate file
* 1.21.5
Automatically generated by python-semantic-release
* fix: Fixed metadata errors (#1547)
* 1.21.6
Automatically generated by python-semantic-release
* fix: remove curev1 from multlingual (#1552)
Seems like it was added here:
1cc6c9e
* 1.21.7
Automatically generated by python-semantic-release
* fix: Add Model2vec (#1546)
* Added Model2Vec wrapper
* Added Model2vec models
* Added model2vec models to registry
* Added model2vec as a dependency
* Ran linting
* Update mteb/models/model2vec_models.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Update mteb/models/model2vec_models.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Added adapted_from and superseeded_by to model2vec models.
* Added missing import
* Moved pyproject.toml to optional dependencies
* Fixed typos
* Added import error and changed model to model_name
* Added Numpy to frameworks
* Added Numpy to frameworks
* Corrected false info on model2vec models
* Replaced np.inf with maxint
* Update mteb/models/model2vec_models.py
Co-authored-by: Isaac Chung <[email protected]>
* Added option to have infinite max tokens, added it to Model2vec
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
* Made result loading more permissive, changed eval splits for HotPotQA and DBPedia (#1554)
* Removed train and dev from eval splits on HotpotQA
* Removed dev from eval splits on DBPedia
* Made task_results validation more permissive
* Readded exception in get_score
* Ran linting
* 1.21.8
Automatically generated by python-semantic-release
* docs: Correction of SICK-R metadata (#1558)
* Correction of SICK-R metadata
* Correction of SICK-R metadata
---------
Co-authored-by: rposwiata <[email protected]>
* feat(google_models): fix issues and add support for `text-embedding-005` and `text-multilingual-embedding-002` (#1562)
* fix: google_models batching and prompt
* feat: add text-embedding-005 and text-multilingual-embedding-002
* chore: `make lint` errors
* fix: address PR comments
* 1.22.0
Automatically generated by python-semantic-release
* fix(bm25s): search implementation (#1566)
fix: bm25s implementation
* 1.22.1
Automatically generated by python-semantic-release
* docs: Fix dependency library name for bm25s (#1568)
* fix: bm25s implementation
* correct library name
---------
Co-authored-by: Daniel Buades Marcos <[email protected]>
* fix: Add training dataset to model meta (#1561)
* fix: Add training dataset to model meta
Adresses #1556
* Added docs
* format
* feat: (cohere_models) cohere_task_type issue, batch requests and tqdm for visualization (#1564)
* feat: batch requests to cohere models
* fix: use correct task_type
* feat: use tqdm with openai
* fix: explicitely set `show_progress_bar` to False
* fix(publichealth-qa):  ignore rows with `None` values in `question` or `answer` (#1565)
* 1.23.0
Automatically generated by python-semantic-release
* fix wongnai
* update inits
* fix tests
* lint
* update imports
* fix tests
* lint
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Napuh <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Thivyanth <[email protected]>
Co-authored-by: Youngjoon Jang <[email protected]>
Co-authored-by: Rafał Poświata <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leaderboard issues related to the leaderboard
Projects
None yet
Development

No branches or pull requests

3 participants