leaderboard 2.0: Add missing models #1515

KennethEnevoldsen · 2024-11-27T19:17:39Z

Planned but still quite a few models missing; would be great to have them all! (e.g. only 42 models for MTEB classic vs 337 in the current leaderboard)

@x-tabdeveloping and I am currently working on resolving these by adding the metadata for the missing models.

related to #1317

isaac-chung · 2024-11-28T12:45:33Z

What's the current diff / where to find that? Would love to help out.

x-tabdeveloping · 2024-11-28T12:54:50Z

Here's a list of models, for which we have results of some sorts, but do not occur in the metadata:

{'Alibaba-NLP/gte-Qwen1.5-7B-instruct',
 'Alibaba-NLP/gte-Qwen2-1.5B-instruct',
 'Alibaba-NLP/gte-base-en-v1.5',
 'Alibaba-NLP/gte-large-en-v1.5',
 'Alibaba-NLP/gte-multilingual-base',
 'BAAI/bge-en-icl',
 'BAAI/bge-m3',
 'BAAI/bge-multilingual-gemma2',
 'BeastyZ/e5-R-mistral-7b',
 'Cohere/Cohere-embed-english-light-v3.0',
 'Cohere/Cohere-embed-english-v3.0',
 'Cohere/Cohere-embed-multilingual-light-v3.0',
 'Cohere/Cohere-embed-multilingual-v3.0',
 'Gameselo/STS-multilingual-mpnet-base-v2',
 'HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1',
 'HIT-TMG/KaLM-embedding-multilingual-mini-v1',
 'Haon-Chen/speed-embedding-7b-instruct',
 'Hum-Works/lodestone-base-4096-v1',
 'Jaume/gemma-2b-embeddings',
 'Labib11/MUG-B-1.6',
 'Lajavaness/bilingual-embedding-base',
 'Lajavaness/bilingual-embedding-large',
 'Lajavaness/bilingual-embedding-small',
 'Linq-AI-Research/Linq-Embed-Mistral',
 'Mihaiii/Bulbasaur',
 'Mihaiii/Ivysaur',
 'Mihaiii/Squirtle',
 'Mihaiii/Venusaur',
 'Mihaiii/Wartortle',
 'Mihaiii/gte-micro',
 'Mihaiii/gte-micro-v4',
 'Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit',
 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit',
 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit',
 'Muennighoff/SGPT-2.7B-weightedmean-msmarco-specb-bitfit',
 'Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit',
 'Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit',
 'Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka',
 'Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet',
 'Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshka',
 'Omartificial-Intelligence-Space/Arabic-labse-Matryoshka',
 'Omartificial-Intelligence-Space/Arabic-mpnet-base-all-nli-triplet',
 'Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka',
 'OrdalieTech/Solon-embeddings-large-0.1',
 'OrlikB/KartonBERT-USE-base-v1',
 'OrlikB/st-polish-kartonberta-base-alpha-v1',
 'Salesforce/SFR-Embedding-Mistral',
 'Snowflake/snowflake-arctic-embed-l',
 'Snowflake/snowflake-arctic-embed-m',
 'Snowflake/snowflake-arctic-embed-m-long',
 'Snowflake/snowflake-arctic-embed-m-v1.5',
 'Snowflake/snowflake-arctic-embed-s',
 'Snowflake/snowflake-arctic-embed-xs',
 'aari1995/German_Semantic_STS_V2',
 'abhinand/MedEmbed-small-v0.1',
 'amazon/Titan-text-embeddings-v2',
 'andersonbcdefg/bge-small-4096',
 'avsolatorio/GIST-Embedding-v0',
 'avsolatorio/GIST-all-MiniLM-L6-v2',
 'avsolatorio/GIST-large-Embedding-v0',
 'avsolatorio/GIST-small-Embedding-v0',
 'avsolatorio/NoInstruct-small-Embedding-v0',
 'bigscience/sgpt-bloom-7b1-msmarco',
 'biswa921/bge-m3',
 'brahmairesearch/slx-v0.1',
 'consciousAI/cai-lunaris-text-embeddings',
 'consciousAI/cai-stellaris-text-embeddings',
 'deepfile/embedder-100p',
 'deepvk/USER-bge-m3',
 'djovak/multi-qa-MiniLM-L6-cos-v1',
 'dumyy/sft-bge-small',
 'dwzhu/e5-base-4k',
 'facebook/SONAR',
 'infgrad/stella-base-en-v2',
 'intfloat/e5-base',
 'intfloat/e5-large',
 'izhx/udever-bloom-1b1',
 'izhx/udever-bloom-3b',
 'izhx/udever-bloom-560m',
 'izhx/udever-bloom-7b1',
 'jamesgpt1/sf_model_e5',
 'jinaai/jina-embedding-b-en-v1',
 'jinaai/jina-embedding-l-en-v1',
 'jinaai/jina-embedding-s-en-v1',
 'jinaai/jina-embeddings-v2-base-de',
 'jinaai/jina-embeddings-v2-base-en',
 'jinaai/jina-embeddings-v2-base-es',
 'jinaai/jina-embeddings-v2-small-en',
 'jxm/cde-small-v1',
 'malenia1/ternary-weight-embedding',
 'manu/bge-m3-custom-fr',
 'manu/sentence_croissant_alpha_v0.2',
 'manu/sentence_croissant_alpha_v0.3',
 'manu/sentence_croissant_alpha_v0.4',
 'minishlab/M2V_base_glove',
 'minishlab/M2V_base_glove_subword',
 'minishlab/M2V_base_output',
 'minishlab/potion-base-2M',
 'minishlab/potion-base-4M',
 'minishlab/potion-base-8M',
 'mixedbread-ai/mxbai-embed-2d-large-v1',
 'mixedbread-ai/mxbai-embed-xsmall-v1',
 'neuralmagic/bge-base-en-v1.5-quant',
 'neuralmagic/bge-base-en-v1.5-sparse',
 'neuralmagic/bge-large-en-v1.5-quant',
 'neuralmagic/bge-large-en-v1.5-sparse',
 'neuralmagic/bge-small-en-v1.5-quant',
 'neuralmagic/bge-small-en-v1.5-sparse',
 'nomic-ai/nomic-embed-text-v1-ablated',
 'nomic-ai/nomic-embed-text-v1-unsupervised',
 'nvidia/NV-Embed-v1',
 'nvidia/NV-Embed-v2',
 'nvidia/NV-Retriever-v1',
 'omarelshehy/arabic-english-sts-matryoshka',
 'openbmb/MiniCPM-Embedding',
 'qinxianliu/FAB-Ramy-v1',
 'qinxianliu/FAE-v1',
 'qinxianliu/FUE-v1',
 'sdadas/mmlw-e5-base',
 'sdadas/mmlw-e5-large',
 'sdadas/mmlw-e5-small',
 'sdadas/mmlw-roberta-base',
 'sdadas/mmlw-roberta-large',
 'sentence-transformers/all-MiniLM-L12-v2',
 'sentence-transformers/all-mpnet-base-v2',
 'shibing624/text2vec-base-multilingual',
 'silma-ai/silma-embeddding-matryoshka-v0.1',
 'tanmaylaud/ret-phi2-v0',
 'thenlper/gte-base',
 'thenlper/gte-large',
 'thenlper/gte-small',
 'thtang/ALL_862873',
 'tsirif/BinGSE-Meta-Llama-3-8B-Instruct',
 'vprelovac/universal-sentence-encoder-4',
 'vprelovac/universal-sentence-encoder-large-5',
 'vprelovac/universal-sentence-encoder-multilingual-3',
 'vprelovac/universal-sentence-encoder-multilingual-large-3',
 'zeroshot/gte-large-quant',
 'zeroshot/gte-large-sparse',
 'zeroshot/gte-small-quant',
 'zeta-alpha-ai/Zeta-Alpha-E5-Mistral'}

I suggest, that we seriously consider whether we would like to have quantized versions of the same models in the new leaderboard.
I would also suggest to remove copies or finetunes of more popular models, where there is no indication how they are different from the original.

KennethEnevoldsen · 2024-11-29T09:59:53Z

This is my classification of the models:

sorting = {
    "clear keep": [
        # probably all of these should be registrered as well
        "Alibaba-NLP/gte-Qwen1.5-7B-instruct", #PR exists
        "Alibaba-NLP/gte-Qwen2-1.5B-instruct", #PR exists
        "Alibaba-NLP/gte-base-en-v1.5", #PR exists
        "Alibaba-NLP/gte-large-en-v1.5", #PR exists
        "Alibaba-NLP/gte-multilingual-base", #PR exists
        "BAAI/bge-en-icl", #PR exists
        "BAAI/bge-m3", #PR exists
        "BAAI/bge-multilingual-gemma2", #PR exists
        "Linq-AI-Research/Linq-Embed-Mistral", #PR exists
        "Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-125M-weightedmean-nli-bitfit",
        "Muennighoff/SGPT-2.7B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit",
        "jinaai/jina-embedding-b-en-v1",
        "jinaai/jina-embedding-l-en-v1",
        "jinaai/jina-embedding-s-en-v1",
        "jinaai/jina-embeddings-v2-base-de",
        "jinaai/jina-embeddings-v2-base-en",
        "jinaai/jina-embeddings-v2-base-es",
        "jinaai/jina-embeddings-v2-small-en",
        "jxm/cde-small-v1",
        "intfloat/e5-base",
        "intfloat/e5-large",
        "facebook/SONAR",
        "amazon/Titan-text-embeddings-v2",
        "nvidia/NV-Embed-v1",  # some models are versions of eachother (we could include a "superseeded by" column to allow users to filter earlier versions)
        "nvidia/NV-Embed-v2",
        "nvidia/NV-Retriever-v1",
        "mixedbread-ai/mxbai-embed-2d-large-v1",
        "mixedbread-ai/mxbai-embed-xsmall-v1",
        "nomic-ai/nomic-embed-text-v1-ablated",
        "nomic-ai/nomic-embed-text-v1-unsupervised",
        "minishlab/M2V_base_glove", #PR exists
        "minishlab/M2V_base_glove_subword", #PR exists
        "minishlab/M2V_base_output", #PR exists
        "minishlab/potion-base-2M", #PR exists
        "minishlab/potion-base-4M", #PR exists
        "minishlab/potion-base-8M", #PR exists
        "Salesforce/SFR-Embedding-Mistral",
        "Snowflake/snowflake-arctic-embed-l",
        "Snowflake/snowflake-arctic-embed-m",
        "Snowflake/snowflake-arctic-embed-m-long",
        "Snowflake/snowflake-arctic-embed-m-v1.5",
        "Snowflake/snowflake-arctic-embed-s",
        "Snowflake/snowflake-arctic-embed-xs",
        "sentence-transformers/all-MiniLM-L12-v2",
        "sentence-transformers/all-mpnet-base-v2",
    ],
    "keep": [
        "Haon-Chen/speed-embedding-7b-instruct",
        "Gameselo/STS-multilingual-mpnet-base-v2",
        "HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1",
        "HIT-TMG/KaLM-embedding-multilingual-mini-v1",
        "Hum-Works/lodestone-base-4096-v1",
        "Jaume/gemma-2b-embeddings",
        "BeastyZ/e5-R-mistral-7b",
        "Lajavaness/bilingual-embedding-base",
        "Lajavaness/bilingual-embedding-large",
        "Lajavaness/bilingual-embedding-small",
        "Mihaiii/Bulbasaur",
        "Mihaiii/Ivysaur",
        "Mihaiii/Squirtle",
        "Mihaiii/Venusaur",
        "Mihaiii/Wartortle",
        "Mihaiii/gte-micro",
        "Mihaiii/gte-micro-v4",
        "OrdalieTech/Solon-embeddings-large-0.1",
        "Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet",
        "Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-labse-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-mpnet-base-all-nli-triplet",
        "Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka",
        "consciousAI/cai-lunaris-text-embeddings",
        "consciousAI/cai-stellaris-text-embeddings",
        "manu/bge-m3-custom-fr",
        "manu/sentence_croissant_alpha_v0.2",
        "manu/sentence_croissant_alpha_v0.3",
        "manu/sentence_croissant_alpha_v0.4",
        "thenlper/gte-base",
        "thenlper/gte-large",
        "thenlper/gte-small",
        "OrlikB/KartonBERT-USE-base-v1",
        "OrlikB/st-polish-kartonberta-base-alpha-v1",
        "sdadas/mmlw-e5-base",  # some models are monolingual adaptions of a another models (I would include them for now)
        "dwzhu/e5-base-4k",  # e.g. this is a long doc adaption of e5
        "sdadas/mmlw-e5-large",
        "sdadas/mmlw-e5-small",
        "sdadas/mmlw-roberta-base",
        "sdadas/mmlw-roberta-large",
        "izhx/udever-bloom-1b1",
        "izhx/udever-bloom-3b",
        "izhx/udever-bloom-560m",
        "izhx/udever-bloom-7b1",
        "avsolatorio/GIST-Embedding-v0",
        "avsolatorio/GIST-all-MiniLM-L6-v2",
        "avsolatorio/GIST-large-Embedding-v0",
        "avsolatorio/GIST-small-Embedding-v0",
        "bigscience/sgpt-bloom-7b1-msmarco",
        "aari1995/German_Semantic_STS_V2",
        "abhinand/MedEmbed-small-v0.1",
        "avsolatorio/NoInstruct-small-Embedding-v0",
        "brahmairesearch/slx-v0.1",
        "deepfile/embedder-100p",
        "deepvk/USER-bge-m3",
        "infgrad/stella-base-en-v2",
        "malenia1/ternary-weight-embedding",
        "omarelshehy/arabic-english-sts-matryoshka",
        "openbmb/MiniCPM-Embedding",
        "shibing624/text2vec-base-multilingual",
        "silma-ai/silma-embeddding-matryoshka-v0.1",
        "zeta-alpha-ai/Zeta-Alpha-E5-Mistral",
    ],
    "quantizations": [
        # I think we need to think of a good way to include quantizations (potentially let be be toggle-able and by default being toggle-able)
        "zeroshot/gte-large-quant",
        "zeroshot/gte-large-sparse",
        "zeroshot/gte-small-quant",
        "neuralmagic/bge-base-en-v1.5-quant",
        "neuralmagic/bge-base-en-v1.5-sparse",
        "neuralmagic/bge-large-en-v1.5-quant",
        "neuralmagic/bge-large-en-v1.5-sparse",
        "neuralmagic/bge-small-en-v1.5-quant",
        "neuralmagic/bge-small-en-v1.5-sparse",
    ],
    "probably remove": [
        # seems to have been a part of MTEB tests (I don't think we use these anymore)
        "vprelovac/universal-sentence-encoder-4",
        "vprelovac/universal-sentence-encoder-large-5",
        "vprelovac/universal-sentence-encoder-multilingual-3",
        "vprelovac/universal-sentence-encoder-multilingual-large-3",
        # duplicate
        "biswa921/bge-m3",
        # not enough info
        "Labib11/MUG-B-1.6",
        "thtang/ALL_862873",
        "qinxianliu/FAB-Ramy-v1",
        "qinxianliu/FAE-v1",
        "qinxianliu/FUE-v1",
        "dumyy/sft-bge-small",
        "jamesgpt1/sf_model_e5",
        "tsirif/BinGSE-Meta-Llama-3-8B-Instruct",
        "tanmaylaud/ret-phi2-v0",
        "andersonbcdefg/bge-small-4096",
        # this is actually for sentence-transformers/multi-qa-MiniLM-L6-cos-v1, we might just rename?
        # but it is probably better to implement it and rerun it
        "djovak/multi-qa-MiniLM-L6-cos-v1",
    ],
}

I have been fairly conservative on removing models, but i def. think that we should add:

adapted_from to the metadata (to all users to remove quantizations, fine-tunes, long doc extension etc.)
supersedes: e.g. nvidia/NV-Embed-v2 supersedes v1. This allows us to filter earlier versions

x-tabdeveloping · 2024-11-29T13:22:16Z

@KennethEnevoldsen I couldn't agree more, sorry for not sending this earlier, but I basically came up with the same list

isaac-chung · 2024-11-29T19:08:05Z

Do you need help? If so, which ones could I take and where do I commit the metadata to?

KennethEnevoldsen · 2024-12-01T13:20:33Z

@isaac-chung can start with the models that have an easy sentence-transformers implementation.

E.g.
"sentence-transformer/multi-qa-MiniLM-L6-cos-v1"
"sentence-transformers/all-mpnet-base-v2"

My understanding is that @x-tabdeveloping will not add the implementation (i.e. loader), but just the metadata.

isaac-chung · 2024-12-01T14:13:39Z

Gotcha. Yes, I can start with those.

x-tabdeveloping · 2024-12-02T07:29:45Z

I'd really appreciate some help on this! As far as I know there are already some PRs open with some model metas.
Anything that isn't already in the pipeline is free real estate!

isaac-chung · 2024-12-02T08:02:45Z

Happy to help! Wanted to clarify these:

Where are these PRs / should they go, in the results repo?
Where would you like the metadata to be added?
To find whether they are sentence transformers compatible, I guess I can refer to this file?

x-tabdeveloping · 2024-12-02T08:07:14Z

Add new models nvidia, gte, linq #1436 Add model jxm/cde-small-v1 #1521 fix: Proprietary models now get correctly shown in leaderboard #1530
In mteb/models/<type_of_model>.py
You can check the HF repo, models compatible with sentence-transformers will usually have a tag that says so, and also have a 1_Pooling folder

isaac-chung · 2024-12-02T08:11:16Z

Thanks, @x-tabdeveloping ! This PR is in the right location then 👍 I'll keep going for the s-t compatible ones.

for #1515

#1515

KennethEnevoldsen · 2024-12-03T16:14:21Z

Updated the list to remove everything which has a PR:

While there is still a few we should probably add. There are quite a few that would be annoying to add manually (e.g. shibing624/text2vec-base-multilingual). I'm unsure if we want to add these automatically (@x-tabdeveloping you mentioned that you had a script for this?)

I plan to completely remove the ones in "probably remove" from the results repo. If someone thinks this is a bad idea, let me know

sorting = {
    "clear keep": [
        "BAAI/bge-en-icl",
        "BAAI/bge-m3",
        "BAAI/bge-multilingual-gemma2",
        "Linq-AI-Research/Linq-Embed-Mistral",
        "Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-125M-weightedmean-nli-bitfit",
        "Muennighoff/SGPT-2.7B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit",
        "Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit",
        "facebook/SONAR",
        "amazon/Titan-text-embeddings-v2",
        "Salesforce/SFR-Embedding-Mistral",
    ],
    "keep": [
        "Haon-Chen/speed-embedding-7b-instruct",
        "Gameselo/STS-multilingual-mpnet-base-v2",
        "HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1",
        "HIT-TMG/KaLM-embedding-multilingual-mini-v1",
        "Hum-Works/lodestone-base-4096-v1",
        "Jaume/gemma-2b-embeddings",
        "BeastyZ/e5-R-mistral-7b",
        "Lajavaness/bilingual-embedding-base",
        "Lajavaness/bilingual-embedding-large",
        "Lajavaness/bilingual-embedding-small",
        "Mihaiii/Bulbasaur",
        "Mihaiii/Ivysaur",
        "Mihaiii/Squirtle",
        "Mihaiii/Venusaur",
        "Mihaiii/Wartortle",
        "Mihaiii/gte-micro",
        "Mihaiii/gte-micro-v4",
        "OrdalieTech/Solon-embeddings-large-0.1",
        "Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet",
        "Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-labse-Matryoshka",
        "Omartificial-Intelligence-Space/Arabic-mpnet-base-all-nli-triplet",
        "Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka",
        "consciousAI/cai-lunaris-text-embeddings",
        "consciousAI/cai-stellaris-text-embeddings",
        "manu/bge-m3-custom-fr",
        "manu/sentence_croissant_alpha_v0.2",
        "manu/sentence_croissant_alpha_v0.3",
        "manu/sentence_croissant_alpha_v0.4",
        "thenlper/gte-base",
        "thenlper/gte-large",
        "thenlper/gte-small",
        "OrlikB/KartonBERT-USE-base-v1",
        "OrlikB/st-polish-kartonberta-base-alpha-v1",
        "sdadas/mmlw-e5-base",  # some models are monolingual adaptions of a another models (I would include them for now)
        "dwzhu/e5-base-4k",  # e.g. this is a long doc adaption of e5
        "sdadas/mmlw-e5-large",
        "sdadas/mmlw-e5-small",
        "sdadas/mmlw-roberta-base",
        "sdadas/mmlw-roberta-large",
        "izhx/udever-bloom-1b1",
        "izhx/udever-bloom-3b",
        "izhx/udever-bloom-560m",
        "izhx/udever-bloom-7b1",
        "avsolatorio/GIST-Embedding-v0",
        "avsolatorio/GIST-all-MiniLM-L6-v2",
        "avsolatorio/GIST-large-Embedding-v0",
        "avsolatorio/GIST-small-Embedding-v0",
        "bigscience/sgpt-bloom-7b1-msmarco",
        "aari1995/German_Semantic_STS_V2",
        "abhinand/MedEmbed-small-v0.1",
        "avsolatorio/NoInstruct-small-Embedding-v0",
        "brahmairesearch/slx-v0.1",
        "deepfile/embedder-100p",
        "deepvk/USER-bge-m3",
        "infgrad/stella-base-en-v2",
        "malenia1/ternary-weight-embedding",
        "omarelshehy/arabic-english-sts-matryoshka",
        "openbmb/MiniCPM-Embedding",
        "shibing624/text2vec-base-multilingual",
        "silma-ai/silma-embeddding-matryoshka-v0.1",
        "zeta-alpha-ai/Zeta-Alpha-E5-Mistral",
    ],
    "quantizations": [
        # I think we need to think of a good way to include quantizations (potentially let be be toggle-able and by default being toggle-able)
        "zeroshot/gte-large-quant",
        "zeroshot/gte-large-sparse",
        "zeroshot/gte-small-quant",
        "neuralmagic/bge-base-en-v1.5-quant",
        "neuralmagic/bge-base-en-v1.5-sparse",
        "neuralmagic/bge-large-en-v1.5-quant",
        "neuralmagic/bge-large-en-v1.5-sparse",
        "neuralmagic/bge-small-en-v1.5-quant",
        "neuralmagic/bge-small-en-v1.5-sparse",
    ],
    "probably remove": [
        # seems to have been a part of MTEB tests (I don't think we use these anymore)
        "vprelovac/universal-sentence-encoder-4",
        "vprelovac/universal-sentence-encoder-large-5",
        "vprelovac/universal-sentence-encoder-multilingual-3",
        "vprelovac/universal-sentence-encoder-multilingual-large-3",
        # duplicate
        "biswa921/bge-m3",
        # not enough info
        "Labib11/MUG-B-1.6",
        "thtang/ALL_862873",
        "qinxianliu/FAB-Ramy-v1",
        "qinxianliu/FAE-v1",
        "qinxianliu/FUE-v1",
        "dumyy/sft-bge-small",
        "jamesgpt1/sf_model_e5",
        "tsirif/BinGSE-Meta-Llama-3-8B-Instruct",
        "tanmaylaud/ret-phi2-v0",
        "andersonbcdefg/bge-small-4096",
    ],
}

#1515

for #1515

x-tabdeveloping · 2024-12-06T12:28:29Z

I have a script but it's not particularly good. I can put in a bit more work into that if it's something we want

x-tabdeveloping · 2024-12-06T12:40:13Z

I'm on it

x-tabdeveloping · 2024-12-09T10:29:37Z

What are we missing still? Is it just this PR? #1436

* fix: Count unique texts, data leaks in calculate metrics (#1438) * add more stat * add more stat * update statistics * fix: update task metadata to allow for null (#1448) * Update tasks table * 1.19.5 Automatically generated by python-semantic-release * Fix: Made data parsing in the leaderboard figure more robust (#1450) Bugfixes with data parsing in main figure * Fixed task loading (#1451) * Fixed task result loading from disk * Fixed task result loading from disk * fix: publish (#1452) * 1.19.6 Automatically generated by python-semantic-release * fix: Fix load external results with `None` mteb_version (#1453) * fix * lint * 1.19.7 Automatically generated by python-semantic-release * WIP: Polishing up leaderboard UI (#1461) * fix: Removed column wrapping on the table, so that it remains readable * Added disclaimer to figure * fix: Added links to task info table, switched out license with metric * fix: loading pre 1.11.0 (#1460) * small fix * fix: fix * 1.19.8 Automatically generated by python-semantic-release * fix: swap touche2020 to maintain compatibility (#1469) swap touche2020 for parity * 1.19.9 Automatically generated by python-semantic-release * docs: Add sum per language for task counts (#1468) * add sum per lang * add sort by sum option * make lint * fix: pinned datasets to <3.0.0 (#1470) * 1.19.10 Automatically generated by python-semantic-release * feat: add CUREv1 retrieval dataset (#1459) * feat: add CUREv1 dataset --------- Co-authored-by: nadshe <[email protected]> Co-authored-by: olivierr42 <[email protected]> Co-authored-by: Daniel Buades Marcos <[email protected]> * feat: add missing domains to medical tasks * feat: modify benchmark tasks * chore: benchmark naming --------- Co-authored-by: nadshe <[email protected]> Co-authored-by: olivierr42 <[email protected]> * Update tasks table * 1.20.0 Automatically generated by python-semantic-release * fix: check if `model` attr of model exists (#1499) * check if model attr of model exists * lint * Fix retrieval evaluator * 1.20.1 Automatically generated by python-semantic-release * fix: Leaderboard demo data loading (#1507) * Made get_scores error tolerant * Added join_revisions, made get_scores failsafe * Fetching metadata fixed fr HF models * Added failsafe metadata fetching to leaderboard code * Added revision joining to leaderboard app * fix * Only show models that have metadata, when filter_models is called * Ran linting * 1.20.2 Automatically generated by python-semantic-release * fix: leaderboard only shows models that have ModelMeta (#1508) Filtering for models that have metadata * 1.20.3 Automatically generated by python-semantic-release * fix: align readme with current mteb (#1493) * align readme with current mteb * align with mieb branch * fix test * 1.20.4 Automatically generated by python-semantic-release * docs: Add lang family mapping and map to task table (#1486) * add lang family mapping and map to task table * make lint * add back some unclassified lang codes * Update tasks table * fix: Ensure that models match the names on embedding-benchmarks/results (#1519) * 1.20.5 Automatically generated by python-semantic-release * fix: Adding missing metadata on models and mathcing names up with the results repo (#1528) * Added Voyage 3 models * Added correct metadata to Cohere models and matched names with the results repo * 1.20.6 Automatically generated by python-semantic-release * feat: Evaluate missing splits (#1525) * fix: evaluate missing splits (#1268) * implement partial evaluation for missing splits * lint * requested changes done from scratch * test for missing split evaluation added * uncomment test * lint * avoid circular import * use TaskResult * skip tests for now --------- Co-authored-by: Isaac Chung <[email protected]> * got test_all_splits_evaluated passing * tests passing * address review comments * make lint * handle None cases for kg_co2_emissions * use new results info --------- Co-authored-by: Thivyanth <[email protected]> * 1.21.0 Automatically generated by python-semantic-release * fix: Correct typos superseeded -> superseded (#1532) fix typo -> superseded * 1.21.1 Automatically generated by python-semantic-release * fix: Task load data error for SICK-BR-STS and XStance (#1534) * fix task load data for two tasks * correct dataset keys * 1.21.2 Automatically generated by python-semantic-release * fix: Proprietary models now get correctly shown in leaderboard (#1530) * Fixed showing proprietary models in leaderboard * Added links to all OpenAI models * Fixed table formatting issues * Bumped Gradio version * 1.21.3 Automatically generated by python-semantic-release * docs: Add Model Meta parameters and metadata (#1536) * add multi_qa_MiniLM_L6_cos_v1 model meta * add all_mpnet_base_v2 * add parameters to model meta * make lint * add extra params to meta * fix: add more model meta (jina, e5) (#1537) * add e5 model meta * address review comments * 1.21.4 Automatically generated by python-semantic-release * Add cohere models (#1538) * fix: bug cohere names * format * fix: add nomic models (#1543) #1515 * fix: Added all-minilm-l12-v2 (#1542) #1515 * fix: Added arctic models (#1541) #1515 * fix: add sentence trimming to OpenAIWrapper (#1526) * fix: add sentence trimming to OpenAIWrapper * fix: import tiktoken library inside encode function * fix: check tokenizer library installed and update ModelMeta to pass tokenizer_name * fix: pass tokenizer_name, max_tokens to loader * fix: make tokenizer_name None for default * fix: delete changes for ModelMeta * fix: fix revision to 2 for OpenAI models * fix: add docstring for OpenAIWrapper * fix: lint * feat: add openai optional dependency set * fix: add sleep for too many requests * fix: add lint * fix: delete evaluate file * 1.21.5 Automatically generated by python-semantic-release * fix: Fixed metadata errors (#1547) * 1.21.6 Automatically generated by python-semantic-release * fix: remove curev1 from multlingual (#1552) Seems like it was added here: 1cc6c9e * 1.21.7 Automatically generated by python-semantic-release * fix: Add Model2vec (#1546) * Added Model2Vec wrapper * Added Model2vec models * Added model2vec models to registry * Added model2vec as a dependency * Ran linting * Update mteb/models/model2vec_models.py Co-authored-by: Kenneth Enevoldsen <[email protected]> * Update mteb/models/model2vec_models.py Co-authored-by: Kenneth Enevoldsen <[email protected]> * Added adapted_from and superseeded_by to model2vec models. * Added missing import * Moved pyproject.toml to optional dependencies * Fixed typos * Added import error and changed model to model_name * Added Numpy to frameworks * Added Numpy to frameworks * Corrected false info on model2vec models * Replaced np.inf with maxint * Update mteb/models/model2vec_models.py Co-authored-by: Isaac Chung <[email protected]> * Added option to have infinite max tokens, added it to Model2vec --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: Isaac Chung <[email protected]> * Made result loading more permissive, changed eval splits for HotPotQA and DBPedia (#1554) * Removed train and dev from eval splits on HotpotQA * Removed dev from eval splits on DBPedia * Made task_results validation more permissive * Readded exception in get_score * Ran linting * 1.21.8 Automatically generated by python-semantic-release * docs: Correction of SICK-R metadata (#1558) * Correction of SICK-R metadata * Correction of SICK-R metadata --------- Co-authored-by: rposwiata <[email protected]> * feat(google_models): fix issues and add support for `text-embedding-005` and `text-multilingual-embedding-002` (#1562) * fix: google_models batching and prompt * feat: add text-embedding-005 and text-multilingual-embedding-002 * chore: `make lint` errors * fix: address PR comments * 1.22.0 Automatically generated by python-semantic-release * fix(bm25s): search implementation (#1566) fix: bm25s implementation * 1.22.1 Automatically generated by python-semantic-release * docs: Fix dependency library name for bm25s (#1568) * fix: bm25s implementation * correct library name --------- Co-authored-by: Daniel Buades Marcos <[email protected]> * fix: Add training dataset to model meta (#1561) * fix: Add training dataset to model meta Adresses #1556 * Added docs * format * feat: (cohere_models) cohere_task_type issue, batch requests and tqdm for visualization (#1564) * feat: batch requests to cohere models * fix: use correct task_type * feat: use tqdm with openai * fix: explicitely set `show_progress_bar` to False * fix(publichealth-qa): ignore rows with `None` values in `question` or `answer` (#1565) * 1.23.0 Automatically generated by python-semantic-release * fix wongnai * update inits * fix tests * lint * update imports * fix tests * lint --------- Co-authored-by: Kenneth Enevoldsen <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <[email protected]> Co-authored-by: Márton Kardos <[email protected]> Co-authored-by: Isaac Chung <[email protected]> Co-authored-by: Napuh <[email protected]> Co-authored-by: Daniel Buades Marcos <[email protected]> Co-authored-by: nadshe <[email protected]> Co-authored-by: olivierr42 <[email protected]> Co-authored-by: Thivyanth <[email protected]> Co-authored-by: Youngjoon Jang <[email protected]> Co-authored-by: Rafał Poświata <[email protected]>

isaac-chung added the leaderboard issues related to the leaderboard label Nov 30, 2024

This was referenced Dec 2, 2024

docs: add more model meta (jina, e5) #1537

Merged

docs: Add Model Meta parameters and metadata #1536

Merged

This was referenced Dec 3, 2024

Add new models nvidia, gte, linq #1436

Merged

fix: Added gte models #1539

Open

KennethEnevoldsen closed this as completed Dec 3, 2024

KennethEnevoldsen reopened this Dec 3, 2024

KennethEnevoldsen mentioned this issue Dec 3, 2024

Add cohere models #1538

Merged

2 tasks

KennethEnevoldsen added a commit that referenced this issue Dec 3, 2024

fix: Add mixbai models

f47a9af

for #1515

KennethEnevoldsen mentioned this issue Dec 3, 2024

fix: Add mixbai models #1540

Merged

2 tasks

KennethEnevoldsen added a commit that referenced this issue Dec 3, 2024

fix: Added arctic models

c2c2c6b

#1515

KennethEnevoldsen mentioned this issue Dec 3, 2024

fix: Added arctic models #1541

Merged

2 tasks

KennethEnevoldsen added a commit that referenced this issue Dec 3, 2024

fix: Added all-minilm-l12-v2

be7740b

#1515

KennethEnevoldsen mentioned this issue Dec 3, 2024

fix: Added all-minilm-l12-v2 #1542

Merged

2 tasks

KennethEnevoldsen added a commit that referenced this issue Dec 3, 2024

fix: add nomic models

b8328da

#1515

KennethEnevoldsen mentioned this issue Dec 3, 2024

fix: add nomic models #1543

Merged

2 tasks

x-tabdeveloping pushed a commit that referenced this issue Dec 4, 2024

fix: add nomic models (#1543)

5013df8

#1515

x-tabdeveloping pushed a commit that referenced this issue Dec 4, 2024

fix: Added all-minilm-l12-v2 (#1542)

97ab272

#1515

x-tabdeveloping pushed a commit that referenced this issue Dec 4, 2024

fix: Added arctic models (#1541)

df11c38

#1515

x-tabdeveloping pushed a commit that referenced this issue Dec 4, 2024

fix: Add mixbai models (#1540)

3e8d2cc

for #1515

x-tabdeveloping mentioned this issue Dec 4, 2024

Add Model2Vec models #1545

Closed

KennethEnevoldsen mentioned this issue Dec 4, 2024

Add minishlab models (model2vec) #1503

Closed

x-tabdeveloping mentioned this issue Dec 6, 2024

fix: Added metadata for miscellaneous models #1557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leaderboard 2.0: Add missing models #1515

leaderboard 2.0: Add missing models #1515

KennethEnevoldsen commented Nov 27, 2024 •

edited

Loading

isaac-chung commented Nov 28, 2024

x-tabdeveloping commented Nov 28, 2024

KennethEnevoldsen commented Nov 29, 2024 •

edited by x-tabdeveloping

Loading

x-tabdeveloping commented Nov 29, 2024

isaac-chung commented Nov 29, 2024

KennethEnevoldsen commented Dec 1, 2024

isaac-chung commented Dec 1, 2024

x-tabdeveloping commented Dec 2, 2024

isaac-chung commented Dec 2, 2024

x-tabdeveloping commented Dec 2, 2024

isaac-chung commented Dec 2, 2024

KennethEnevoldsen commented Dec 3, 2024 •

edited by x-tabdeveloping

Loading

x-tabdeveloping commented Dec 6, 2024

x-tabdeveloping commented Dec 6, 2024

x-tabdeveloping commented Dec 9, 2024

leaderboard 2.0: Add missing models #1515

leaderboard 2.0: Add missing models #1515

Comments

KennethEnevoldsen commented Nov 27, 2024 • edited Loading

isaac-chung commented Nov 28, 2024

x-tabdeveloping commented Nov 28, 2024

KennethEnevoldsen commented Nov 29, 2024 • edited by x-tabdeveloping Loading

x-tabdeveloping commented Nov 29, 2024

isaac-chung commented Nov 29, 2024

KennethEnevoldsen commented Dec 1, 2024

isaac-chung commented Dec 1, 2024

x-tabdeveloping commented Dec 2, 2024

isaac-chung commented Dec 2, 2024

x-tabdeveloping commented Dec 2, 2024

isaac-chung commented Dec 2, 2024

KennethEnevoldsen commented Dec 3, 2024 • edited by x-tabdeveloping Loading

x-tabdeveloping commented Dec 6, 2024

x-tabdeveloping commented Dec 6, 2024

x-tabdeveloping commented Dec 9, 2024

KennethEnevoldsen commented Nov 27, 2024 •

edited

Loading

KennethEnevoldsen commented Nov 29, 2024 •

edited by x-tabdeveloping

Loading

KennethEnevoldsen commented Dec 3, 2024 •

edited by x-tabdeveloping

Loading