This repository showcases a Retrieval-Augmented Generation (RAG) system for interacting with documentation that uses natural language queries to retrieve and summarize relevant information.
interactive-demo.webm
- Creates a Qdrant vector database for embeddings from the given CSV file(s)
- The vector database is used for fast similarity search to find relevant documentation
- We use a CSV based on Hugging Face documentation as an example
- Uses OpenAI's embeddings for similarity search and GPT models for high-quality responses
- Provides an interactive interface for querying the documentation using natural language
- Each query retrieves the most relevant documentation snippets for context
- Answers include source links for reference
- Valohai account to run the pipelines
- OpenAI account to use their APIs
- Less than $5 in OpenAI credits
If you can't find this project in your Valohai Templates, you can set it up manually:
-
Create a new project on Valohai
-
Set the project repository to:
https://github.com/valohai/rag-doc-example
-
Save the settings and click "Fetch Repository"
-
๐ Create an OpenAI API key for this project
- We will need the API key next so record it down
-
Assign the API key to this project:
You will see โ Changes to OPENAI_API_KEY saved if everything went correctly.
And now you are ready to run the pipelines!
- Navigate to the "Pipelines" tab
- Click the "Create Pipeline" button
- Select the "assistant-pipeline" pipeline template
- Click the "Create pipeline from template" button
- Feel free to look around and finally click the "Create pipeline" button
This will start the pipeline:
Feel free to explore around while it runs.
When it finishes, the last step will contain qualitative results to review:
This manual evaluation is a simplification how to validate the quality of the generated
responses. "LLM evals" is a large topic outside the scope of this particular example.
Now you have a mini-pipeline that maintains a RAG vector database and allows you to ask questions about the documentation. You can ask your own questions by creating new executions based on the "do-query" step.
The repository also contains a pipeline "assistant-pipeline-with-deployment" which deploys the RAG system to an HTTP endpoint after a manual human validation of the "manual-evaluation" pipeline step.
๐คฉ Show Me!
-
Create a Valohai Deployment to tell where the HTTP endpoint should be hosted:
You can use Valohai Public Cloud and valohai.cloud as the target when trialing out. Make sure to name the deploymentpublic
-
Create a pipeline as we did before, but use the "assistant-pipeline-with-deployment" template.
-
The pipeline will halt to a "โณ๏ธ Pending Approval" state, where you can click the "Approve" button to proceed.
-
After approval, the pipeline will build and deploy the endpoint.
-
You can use the "Test Deployment" button to run a test queries against the endpoint.
This example uses OpenAI for both the embedding and query models.
Either could be changed to a different provider or a local model.
๐คฉ Show Me!
Changing models inside the OpenAI ecosystem is a matter of changing constants in
src/rag_doctor/consts.py
:
EMBEDDING_MODEL = "text-embedding-ada-002"
EMBEDDING_LENGTH = 1_536 # the dimensions of a "text-embedding-ada-002" embedding vector
PROMPT_MODEL = "gpt-4o-mini"
PROMPT_MAX_TOKENS = 128_000 # model "context window" from https://platform.openai.com/docs/models
Further modifying the chat model involves reimplementing the query logic in
src/rag_doctor/query.py
.
Similarly, modifying the embedding model is a matter of reimplementing the embedding logic in both
src/rag_doctor/database.py
and src/rag_doctor/query.py
.
If you decide to change the embedding model, remember to recreate the vector database.
You can take a look at the input file given to the "embedding" node and create a similar CSV from your own documentation and replace the input with that CSV.
You can also run the individual pieces locally by following instructions in the DEVELOPMENT file.