wip: colpali design draft #427

joein · 2024-12-18T16:59:01Z

it's a draft of second iteration of work on colpali #394

fastembed/late_interaction_multimodal/colpali.py

fastembed/late_interaction_multimodal/onnx_multimodal_model.py

Docstring improvements

I8dNLo · 2024-12-23T12:25:06Z

To check out values for tests I use code examples from here

joein

@I8dNLo

joein · 2024-12-27T15:18:51Z

fastembed/late_interaction_multimodal/__init__.py

+    LateInteractionMultimodalEmbedding,
+)
+
+__all__ = ["LateInteractionMultimodalEmbedding"]


should be also exportable from fastembed

from fastembed import LateInteractionMultimodalEmbedding

joein · 2024-12-27T15:22:29Z

tests/test_late_interaction_multimodal.py

+from PIL import Image
+
+# vectors are abridged and rounded for brevity
+CANONICAL_COLUMN_VALUES = {


Maybe we should call it CANONICAL_IMAGE_VALUES? Wdyt?

I was just following our style, but you are right. It should

joein · 2024-12-27T15:27:20Z

tests/test_late_interaction_multimodal.py

+        embeddings_3 = list(model.embed_text(docs, batch_size=10, parallel=0))
+        embeddings_3 = np.stack(embeddings_3, axis=0)


I think we'll never run such a test (we just won't rent a monster capable of handling it)

I'll remove

joein · 2024-12-27T15:45:54Z

fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding_base.py

+        **kwargs,
+    ) -> Iterable[np.ndarray]:
+        """
+        Encode a list of documents into list of embeddings.


encode a list of images

joein · 2024-12-27T15:46:12Z

fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding_base.py

+                If None, don't use data-parallel processing, use default onnxruntime threading instead.
+
+        Returns:
+            List of embeddings, one per document


one per image

joein · 2024-12-27T15:52:12Z

fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py

+        Encode a list of documents into list of embeddings.
+        We use mean pooling with attention so that the model can handle variable-length inputs.
+
+        Args:
+            images: Iterator of image paths or single image path to embed
+            batch_size: Batch size for encoding -- higher values will use more memory, but be faster
+            parallel:
+                If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets.
+                If 0, use all available cores.
+                If None, don't use data-parallel processing, use default onnxruntime threading instead.
+
+        Returns:
+            List of embeddings, one per document


images

mean pooling stuff is redundant

one per image

joein · 2024-12-27T16:27:42Z

fastembed/late_interaction_multimodal/colpali.py

+    {
+        "model": "akshayballal/colpali-v1.2-merged",
+        "dim": 128,
+        "description": "Text embeddings, Unimodal (text), Aligned to image latent space, ColBERT-compatible, 512 tokens max, 2024.",


description is kinda slippery

can we actually call it text / unimodal embeddings?

is it aligned to image latent space or vice versa?

what do you mean here by colbert compatible?

joein · 2024-12-27T16:36:49Z

fastembed/late_interaction_multimodal/colpali.py

+        self.mask_token_id = None
+        self.pad_token_id = None
+        self.skip_list = set()


why do we need it if we don't use it?

joein · 2024-12-27T16:53:48Z

fastembed/late_interaction_multimodal/colpali.py

+            query += "\n"
+
+            texts_query.append(query)
+        encoded = self.tokenizer.encode_batch(texts_query)


should not query max length be 50?

joein · 2024-12-27T17:14:18Z

fastembed/late_interaction_multimodal/colpali.py

+    PAD_TOKEN = "<pad>"
+    QUERY_MARKER_TOKEN_ID = [2, 9413]
+    IMAGE_PLACEHOLDER_SIZE = (3, 448, 448)
+    EMPTY_TEXT_PLACEHOLDER = np.array([257152] * 1024 + [2, 50721, 573, 2416, 235265, 108])


This is actually token ids of the following string '<image>' * 1024 + '<bos>Describe the image.\n'
Could we make it nicer? It's not really readable at the moment

EVEN_ATTENTION_MASK is also not really readable, maybe instead of having this even_attention_mask we could assign 1030 to a constant which seems to be a bit more reasonable

wip: design draft

87bfae1

joein changed the title ~~wip: design draft~~ wip: colpali design draft Dec 18, 2024

I8dNLo reviewed Dec 18, 2024

View reviewed changes

fastembed/late_interaction_multimodal/colpali.py Outdated Show resolved Hide resolved

I8dNLo reviewed Dec 18, 2024

View reviewed changes

fastembed/late_interaction_multimodal/colpali.py Outdated Show resolved Hide resolved

I8dNLo reviewed Dec 18, 2024

View reviewed changes

fastembed/late_interaction_multimodal/onnx_multimodal_model.py Show resolved Hide resolved

I8dNLo added 5 commits December 19, 2024 23:21

Operators fix

b8cda68

Fix model inputs

a7aa9c3

Import from fastembed.late_interaction_multimodal

ade31c5

Fixed method misspelling

5cc0a7e

Tests, which do not run in CI

566d245

Docstring improvements

I8dNLo added 2 commits December 27, 2024 12:40

Merge remote-tracking branch 'origin/main' into colpali-multi

d1b6ce9

Fix tests

d0b1f86

joein commented Dec 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: colpali design draft #427

wip: colpali design draft #427

joein commented Dec 18, 2024

I8dNLo commented Dec 23, 2024

joein left a comment

joein Dec 27, 2024

joein Dec 27, 2024

I8dNLo Dec 27, 2024

joein Dec 27, 2024

I8dNLo Dec 27, 2024

joein Dec 27, 2024

joein Dec 27, 2024

joein Dec 27, 2024

joein Dec 27, 2024

joein Dec 27, 2024

joein Dec 27, 2024

joein Dec 27, 2024

		embeddings_3 = list(model.embed_text(docs, batch_size=10, parallel=0))
		embeddings_3 = np.stack(embeddings_3, axis=0)

wip: colpali design draft #427

Are you sure you want to change the base?

wip: colpali design draft #427

Conversation

joein commented Dec 18, 2024

I8dNLo commented Dec 23, 2024

joein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment