Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MIEB] Add new multimodal retrieval tasks #1523

Open
izhx opened this issue Nov 28, 2024 · 8 comments
Open

[MIEB] Add new multimodal retrieval tasks #1523

izhx opened this issue Nov 28, 2024 · 8 comments
Labels
mieb The image extension of MTEB

Comments

@izhx
Copy link
Contributor

izhx commented Nov 28, 2024

Hi, thanks for the cool MTEB toolkit.

We are currently preparing to release an embedding model for universal multimodal retrieval, along with our compiled evaluations. I noticed that you are also developing image extensions for MTEB. So I would like to inquire if you would be interested in incorporating our testing code into MTEB, perhaps as part of MIEB retrieval.

Our test is primarily divided into four parts: MTEB text retrieval, M-BEIR, ViDoRe, and a few additional it2it retrieval data. I guess many of them has already been incorporated into mteb.

Below is a preliminary model testing results table.

image

If you're interested, where could I find the docs to start with? Thanks a lot.

@isaac-chung
Copy link
Collaborator

Hey @izhx! Thanks for reaching out. Tagging @gowitheflow-1998 here as well.

We're working on integrating MIEB docs with MTEB at the moment. I think the general steps are:

  • Implement any missing retrieval tasks as Any2AnyRetrieval task
  • Implement any missing models
  • Optionally, implement a benchmark as a collection of tasks
  • These would be PRs to the mieb branch

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Nov 29, 2024

Thanks for reaching out @izhx - can you send the reference paper (can't seem to find the paper with that specific table)

@izhx
Copy link
Contributor Author

izhx commented Nov 29, 2024

Thanks for reaching out @izhx - can you send the reference paper (can't seem to find the table with that specific table)

Hi, we will submit the paper to arxiv and open-source the models in about 10 days and are still finalizing the results. @KennethEnevoldsen

In addition, I checked the Any2AnyRetrieval tasks and find that there are only 4 datasets not included.
I will add their implementations and organize the test of our model by mieb.

@gowitheflow-1998
Copy link
Contributor

gowitheflow-1998 commented Nov 29, 2024

Thanks for reaching out! Adding to @isaac-chung's comment, we welcome PRs both to improve the Any2AnyRetrieval Evaluator and add your specific tasks! We'll be happy to benchmark your model on all MIEB tasks on our end as well if you can PR your model implementation to here. An old doc for the full process can be found here.

@isaac-chung isaac-chung added the mieb The image extension of MTEB label Nov 29, 2024
@izhx izhx changed the title About MIEB, adding new multimodal retrieval tasks [MIEB] Add new multimodal retrieval tasks Dec 4, 2024
@izhx
Copy link
Contributor Author

izhx commented Dec 11, 2024

Hi, It appears that in Any2AnyDenseRetrievalExactSearch, we currently only use get_text_embeddings, get_image_embeddings, and get_fused_embeddings to encode both query and corpus. These functions don't differentiate between query and corpus calls.

However, the previous DenseRetrievalExactSearch for text used encode_corpus to obtain corpus embeddings, which is beneficial for models that require distinct instructions for query and corpus processing, such as GTE and our new multimodal embedding model GME.

Therefore, I'm wondering if we should add an is_query parameter to the get_xxx_embeddings functions, defaulting to True, to allow for this distinction.

This is just an example, and also my current implementation. I look forward to everyone's discussion and suggestions for better solutions.

@isaac-chung @gowitheflow-1998

@gowitheflow-1998
Copy link
Contributor

of course, adding the ability to take in instructions (e.g, model-specific prompts triggered by is_query) has always been the plan since the start of MIEB. Although this ability is not optimized for a lot of the image-text models, especially ones that can't naturally do interleaved encodings (e.g, CLIP-based), I personally think this will be de facto for future models.

At the moment, a few state-of-ther-art models have their own optimized formats. e.g., input_type for voyage's multimodal 3; the optimal prompts that E5-V were trained on and thus needed in inference, etc, which we currently support in a model-specific way. As these currently are mostly models-dependent 1) some models differentiate between queries and documents, like voyage's and yours as you mentioned. 2) some need specific templates while staying the same across queries and documents.

In general, I think it makes sense to add is_query if it doesn't affect other multi-modal models that don't benefit from it. Feel free to PR the solution if you have anything in mind! @izhx

@Samoed
Copy link
Collaborator

Samoed commented Dec 11, 2024

FYI, in the main branch, there is a PromptType enum passed to the encode function to specify whether it's a query or a passage. However, I'm not sure how this is implemented in the mieb branch.

Example

@izhx
Copy link
Contributor Author

izhx commented Dec 12, 2024

Thanks for the suggestions!
I think it might be more reasonable to follow the design in main branch, prompt_type: PromptType | None = None,

class PromptType(str, Enum):
    query = "query"
    passage = "passage"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mieb The image extension of MTEB
Projects
None yet
Development

No branches or pull requests

5 participants