Skip to content

RAG system for documentation assistant using Qdrant and OpenAI ๐ŸŒ€

Notifications You must be signed in to change notification settings

valohai/rag-doc-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Retrieval-Augmented Generation with Documentation

aka. RAG Doctor ๐Ÿง‘โ€โš•๏ธ

This repository showcases a Retrieval-Augmented Generation (RAG) system for interacting with documentation that uses natural language queries to retrieve and summarize relevant information.

interactive-demo.webm

Overview

  • Creates a Qdrant vector database for embeddings from the given CSV file(s)
    • The vector database is used for fast similarity search to find relevant documentation
    • We use a CSV based on Hugging Face documentation as an example
  • Uses OpenAI's embeddings for similarity search and GPT models for high-quality responses
  • Provides an interactive interface for querying the documentation using natural language
  • Each query retrieves the most relevant documentation snippets for context
  • Answers include source links for reference

Prerequisites

  • Valohai account to run the pipelines
  • OpenAI account to use their APIs
  • Less than $5 in OpenAI credits

Setup

If you can't find this project in your Valohai Templates, you can set it up manually:

  1. Create a new project on Valohai

  2. Set the project repository to: https://github.com/valohai/rag-doc-example

  3. Save the settings and click "Fetch Repository"


    This makes sure the project is up to date

  4. ๐Ÿ”‘ Create an OpenAI API key for this project

    • We will need the API key next so record it down
  5. Assign the API key to this project:


    You will see โœ… Changes to OPENAI_API_KEY saved if everything went correctly.

And now you are ready to run the pipelines!

Usage

  1. Navigate to the "Pipelines" tab
  2. Click the "Create Pipeline" button
  3. Select the "assistant-pipeline" pipeline template
  4. Click the "Create pipeline from template" button
  5. Feel free to look around and finally click the "Create pipeline" button

This will start the pipeline:


Feel free to explore around while it runs.

When it finishes, the last step will contain qualitative results to review:


This manual evaluation is a simplification how to validate the quality of the generated responses. "LLM evals" is a large topic outside the scope of this particular example.

Now you have a mini-pipeline that maintains a RAG vector database and allows you to ask questions about the documentation. You can ask your own questions by creating new executions based on the "do-query" step.

Next Steps

Automatic Deployment

The repository also contains a pipeline "assistant-pipeline-with-deployment" which deploys the RAG system to an HTTP endpoint after a manual human validation of the "manual-evaluation" pipeline step.

๐Ÿคฉ Show Me!
  1. Create a Valohai Deployment to tell where the HTTP endpoint should be hosted:


    You can use Valohai Public Cloud and valohai.cloud as the target when trialing out. Make sure to name the deployment public

  2. Create a pipeline as we did before, but use the "assistant-pipeline-with-deployment" template.


    The pipeline should look something like this.

  3. The pipeline will halt to a "โณ๏ธ Pending Approval" state, where you can click the "Approve" button to proceed.

  4. After approval, the pipeline will build and deploy the endpoint.

  5. You can use the "Test Deployment" button to run a test queries against the endpoint.

Using Other Models

This example uses OpenAI for both the embedding and query models.

Either could be changed to a different provider or a local model.

๐Ÿคฉ Show Me!

Changing models inside the OpenAI ecosystem is a matter of changing constants in src/rag_doctor/consts.py:

EMBEDDING_MODEL = "text-embedding-ada-002"
EMBEDDING_LENGTH = 1_536  # the dimensions of a "text-embedding-ada-002" embedding vector

PROMPT_MODEL = "gpt-4o-mini"
PROMPT_MAX_TOKENS = 128_000  # model "context window" from https://platform.openai.com/docs/models

Further modifying the chat model involves reimplementing the query logic in src/rag_doctor/query.py.

Similarly, modifying the embedding model is a matter of reimplementing the embedding logic in both src/rag_doctor/database.py and src/rag_doctor/query.py.

If you decide to change the embedding model, remember to recreate the vector database.

Using Your Own Documentation

You can take a look at the input file given to the "embedding" node and create a similar CSV from your own documentation and replace the input with that CSV.

Running it Locally

You can also run the individual pieces locally by following instructions in the DEVELOPMENT file.

About

RAG system for documentation assistant using Qdrant and OpenAI ๐ŸŒ€

Topics

Resources

Stars

Watchers

Forks

Languages