Implement instructor embedding based retrieval #16

NISH1001 · 2024-01-17T03:58:29Z

What

We currently use OAI Embeddings (ada) to embed vectors into a vector store (say pgvector). It's required that all the metadata are tagged at index time. There's no mechanism to update any chunks with new metadata (not in a straightforward manner). We cannot also use arbitrary objects in the metadata filters.

We propose using instructor-embedding that can embed query along with provided filters to embed query to get relevant chunks. langchain has Instruct Embeddings implementation which we can use to embed anything.

Why

instructor-embedding allows a pair (prompt, text) to embed jointly. This allows to use any custom prompt to embed any text.
For example, if we want to embed a query applying certain filters, we can embed through the pair as:
("Represent the query with filters cateogories=['x', 'y']", "<Some long text>")

The text was updated successfully, but these errors were encountered:

NISH1001 self-assigned this Jan 17, 2024

NISH1001 added the enhancement New feature or request label Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement instructor embedding based retrieval #16

Implement instructor embedding based retrieval #16

NISH1001 commented Jan 17, 2024

Implement instructor embedding based retrieval #16

Implement instructor embedding based retrieval #16

Comments

NISH1001 commented Jan 17, 2024

What

Why