-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readme for agent app sample and main readme updates #56
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,16 +1,165 @@ | ||||||
# Retrieval Augmented Generation | ||||||
# Databricks Generative AI Cookbook | ||||||
|
||||||
Please visit http://ai-cookbook.io for the accompanying documentation for this repo. | ||||||
Please visit [ai-cookbook.io](http://ai-cookbook.io) for the accompanying documentation for this repository. | ||||||
|
||||||
This repo provides [learning materials](https://ai-cookbook.io/) and [production-ready code](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code) to build a **high-quality RAG application** using Databricks. The [Mosaic Generative AI Cookbook](https://ai-cookbook.io/) provides: | ||||||
- A conceptual overview and deep dive into various Generative AI design patterns, such as Prompt Engineering, Agents, RAG, and Fine Tuning | ||||||
- An overview of Evaluation-Driven development | ||||||
- The theory of every parameter/knob that impacts quality | ||||||
- How to root cause quality issues and detemermine which knobs are relevant to experiment with for your use case | ||||||
- Best practices for how to experiment with each knob | ||||||
This repository provides [learning materials](https://ai-cookbook.io/) and code examples to build a **high-quality Generative AI application** using Databricks. The Cookbook provides: | ||||||
|
||||||
The [provided code](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code) is intended for use with the Databricks platform. Specifically: | ||||||
- [Mosaic AI Agent Framework](https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html) which provides a fast developer workflow with enterprise-ready LLMops & governance | ||||||
- [Mosaic AI Agent Evaluation](https://docs.databricks.com/en/generative-ai/agent-evaluation/index.html) which provides reliable, quality measurement using proprietary AI-assisted LLM judges to measure quality metrics that are powered by human feedback collected through an intuitive web-based chat UI | ||||||
- A conceptual overview and deep dive into various Generative AI design patterns, such as Prompt Engineering, Agents, RAG, and Fine-Tuning. | ||||||
- An overview of Evaluation-Driven Development. | ||||||
- The theory of every parameter/knob that impacts quality. | ||||||
- How to root cause quality issues and determine which knobs are relevant to experiment with for your use case. | ||||||
- Best practices for how to experiment with each knob. | ||||||
|
||||||
![Alt text](rag_app_sample_code/dbxquality.png) | ||||||
## TL;DR: | ||||||
|
||||||
Choose the recipe that best matches your needs: | ||||||
|
||||||
- **For RAG Applications:** | ||||||
- [RAG Getting Started](./rag_app_sample_code/README.md) | ||||||
- Start with `agent_app_sample_code/A_POC_app` to build a proof of concept | ||||||
- Then explore `agent_app_sample_code/B_quality_iteration` to improve quality | ||||||
- Uses Databricks' Mosaic AI Agent Framework for enterprise features | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what does "enterprise features" mean here? can we be more explicit and say something like "agent serving" and "agent eval" |
||||||
|
||||||
- **For an agent that uses a retriever tool:** | ||||||
- Check out `agent_app_sample_code` | ||||||
|
||||||
- **For OpenAI SDK Integration:** | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- [OpenAI SDK Getting Started](./openai_sdk_agent_app_sample_code/README.md) | ||||||
- Navigate to `openai_sdk_agent_app_sample_code` | ||||||
- Examples of building agents using OpenAI SDK with MLflow PyFunc Models | ||||||
|
||||||
## Repository Structure | ||||||
|
||||||
``` | ||||||
├── agent_app_sample_code/ # Sample code for agent applications | ||||||
│ ├── agents/ # Agent code | ||||||
│ ├── 03_agent_proof_of_concept.py # Example of a proof of concept agent | ||||||
│ └── ... # Additional directories and files | ||||||
├── openai_sdk_agent_app_sample_code/ # Sample code using OpenAI SDK | ||||||
│ └── ... # Directories and files | ||||||
├── rag_app_sample_code/ # Sample code for RAG applications | ||||||
│ ├── A_POC_app/ # Proof-of-Concept applications | ||||||
│ ├── pdf_uc_volume/ # Example of a RAG application using a PDFs | ||||||
│ ├── B_quality_iteration/ # Code for quality iteration | ||||||
│ └── ... # Additional directories and files | ||||||
├── genai_cookbook/ # Documentation and learning materials | ||||||
├── data/ # Sample data for testing and development | ||||||
├── dev/ # Development tools and scripts | ||||||
└── README.md # This README file | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we drop this part? seems hard to maintain as we iterate on this repo |
||||||
``` | ||||||
|
||||||
The `agent_app_sample_code` directory contains sample code for building agent applications using the Databricks platform. | ||||||
|
||||||
The `openai_sdk_agent_app_sample_code` directory contains sample code that uses the OpenAI SDK + MLFlow PyFunc Models for building agents. | ||||||
|
||||||
The `rag_app_sample_code` directory contains sample code for Retrieval-Augmented Generation (RAG) applications. | ||||||
|
||||||
The `genai_cookbook` directory contains a 10 minute getting started guide. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder which part this 10 minute demo has been simplified compared to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
|
||||||
The provided code is intended for use with the Databricks platform. Specifically: | ||||||
- [Mosaic AI Agent Framework](https://docs.databricks.com/generative-ai/agent-framework/build-genai-apps.html) which provides a fast developer workflow with enterprise-ready LLMops & governance | ||||||
- [Mosaic AI Agent Evaluation](https://docs.databricks.com/generative-ai/agent-evaluation/index.html) which provides reliable, quality measurement using proprietary AI-assisted LLM judges to measure quality metrics that are powered by human feedback collected through an intuitive web-based chat UI | ||||||
|
||||||
![Alt text](rag_app_sample_code/dbxquality.png) | ||||||
|
||||||
## Getting Started | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe worth mentioning prereqs like UC |
||||||
|
||||||
### Option 1: Running in a Databricks Workspace (Recommended) | ||||||
|
||||||
1. **Clone the Repository into your Databricks Workspace:** | ||||||
- In your Databricks workspace, go to Repos | ||||||
- Click "Add Repo" | ||||||
- Enter the Git repository URL: `https://github.com/databricks/genai-cookbook.git` | ||||||
|
||||||
2. **Set Up Your Databricks Environment:** | ||||||
- Use Serverless Notebooks or create a new cluster with Databricks Runtime 14.0 or higher | ||||||
|
||||||
3. **Run the Sample Code:** | ||||||
- Navigate to `agent_app_sample_code/A_POC_app` to start with a proof of concept | ||||||
- Follow the numbered notebooks in sequence | ||||||
- Each notebook contains detailed instructions and explanations | ||||||
|
||||||
### Option 2: Running Locally | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this fully supported? we dont recommend this in our docs anywhere, so just want to make sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or it means for running via VS Code but using Databricks' runtime? I feel a bit concerned as we are cloud platform and customers may feel confusing if we suggest run stuff locally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's fair - I think we need to document it, but it might not work consistently across all "recipes". We should probably let each one define their local development workflows if applicable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, this means running in VSCode with databricks connect. |
||||||
|
||||||
1. **Prerequisites:** | ||||||
- Python 3.10 or higher | ||||||
- [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) installed and configured | ||||||
- Git installed on your local machine | ||||||
|
||||||
2. **Clone and Set Up:** | ||||||
```bash | ||||||
# Clone the repository | ||||||
git clone https://github.com/databricks/genai-cookbook.git | ||||||
cd genai-cookbook | ||||||
|
||||||
# Create and activate virtual environment | ||||||
python -m venv venv | ||||||
source venv/bin/activate # On Windows use: venv\Scripts\activate | ||||||
|
||||||
# Install dependencies | ||||||
pip install -r dev/dev_requirements.txt | ||||||
``` | ||||||
|
||||||
3. **Configure Databricks Connection:** | ||||||
- Set up your Databricks CLI credentials: | ||||||
```bash | ||||||
databricks configure --token | ||||||
``` | ||||||
- Follow the prompts to enter your Databricks workspace URL and access token | ||||||
|
||||||
## Contributing | ||||||
|
||||||
We welcome contributions to improve the cookbook! Here's how you can help: | ||||||
|
||||||
### Development Setup | ||||||
|
||||||
1. **Fork and Clone:** | ||||||
- Fork the repository | ||||||
- Clone the forked repository | ||||||
```bash | ||||||
git clone https://github.com/YOUR_USERNAME/genai-cookbook.git | ||||||
cd genai-cookbook | ||||||
``` | ||||||
|
||||||
### Making Changes | ||||||
|
||||||
1. **Create a Feature Branch:** | ||||||
```bash | ||||||
git checkout -b feature/your-feature-name | ||||||
``` | ||||||
|
||||||
2. **Update Documentation:** | ||||||
- If you're adding new a new coobook directory, add a README.md file to the directory describing the new cookbook. | ||||||
|
||||||
3. **Code Style:** | ||||||
- Follow PEP 8 guidelines | ||||||
- Include docstrings for new functions | ||||||
- Add type hints where possible | ||||||
|
||||||
### Submitting Changes | ||||||
|
||||||
1. **Commit Your Changes:** | ||||||
```bash | ||||||
git add . | ||||||
git commit -m "Description of your changes" | ||||||
``` | ||||||
|
||||||
2. **Push to Your Fork:** | ||||||
```bash | ||||||
git push origin feature/your-feature-name | ||||||
``` | ||||||
|
||||||
3. **Create a Pull Request:** | ||||||
- Go to the [Pull Requests](https://github.com/databricks/genai-cookbook/pulls) page | ||||||
- Click "New Pull Request" | ||||||
- Select your fork and branch | ||||||
- Fill out the PR template with: | ||||||
- Description of changes | ||||||
- Related issues | ||||||
- Testing performed | ||||||
- Screenshots/videos of manual testing (required) | ||||||
|
||||||
### Getting Help | ||||||
|
||||||
- For bugs or feature requests, [create an issue](https://github.com/databricks/genai-cookbook/issues) | ||||||
- For questions, start a [GitHub Discussion](https://github.com/databricks/genai-cookbook/discussions) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: the link doesn't work. maybe remove or update the link? |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,112 @@ | ||||||
# Agent Application Sample Code | ||||||
|
||||||
This directory contains sample code for building agent applications using client-side tools. The code demonstrates how to build, evaluate, and improve the quality of your agent applications. | ||||||
|
||||||
## Directory Structure | ||||||
|
||||||
``` | ||||||
├── agents/ # Agent implementation code | ||||||
│ ├── agent_config.py # Configuration classes for the agent | ||||||
│ ├── function_calling_agent_w_retriever_tool.py # Agent implementation with retriever tool | ||||||
│ └── generated_configs/ # Generated agent configuration files | ||||||
├── tests/ # Unit tests | ||||||
├── utils/ # Utility functions and helpers | ||||||
│ ├── build_retriever_index.py # Vector search index creation | ||||||
│ ├── chunk_docs.py # Document chunking utilities | ||||||
│ ├── eval_set_utilities.py # Evaluation set creation helpers | ||||||
│ ├── file_loading.py # File loading utilities | ||||||
│ └── typed_dicts_to_spark_schema.py # Schema conversion utilities | ||||||
├── validators/ # Configuration validators | ||||||
└── README.md # This file | ||||||
``` | ||||||
|
||||||
## Getting Started | ||||||
|
||||||
### Prerequisites | ||||||
|
||||||
- Databricks Runtime 14.0 or higher | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if that could be done by purely Serverless. In the AI cookbook documentation, it mentions both https://ai-cookbook.io/nbs/6-implement-overview.html#requirements. My understanding is the AI Cookbook includes many components and integrates various products, each with its own requirements. Some products, like Mosaic AI vector search, use Serverless runtime, while others rely on the DBR environment. This is why both DBR 14.3 (Non-ML) and Serverless runtime are mentioned. Not sure if that's correct. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah we should specify 'serverless notebook' |
||||||
- A Databricks workspace with access to: | ||||||
- [Mosaic AI Agent Framework](https://docs.databricks.com/en/generative-ai/agent-framework/build-genai-apps.html) | ||||||
- [Mosaic AI Agent Evaluation](https://docs.databricks.com/en/generative-ai/agent-evaluation/index.html) | ||||||
- [Vector Search](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html) | ||||||
- [Model Serving](https://docs.databricks.com/en/machine-learning/model-serving/index.html) | ||||||
|
||||||
### Setup Steps | ||||||
|
||||||
1. **Configure Global Settings** (00_global_config.py): | ||||||
- Set up Unity Catalog locations for your agent | ||||||
- Configure MLflow experiment tracking | ||||||
- Define evaluation settings | ||||||
|
||||||
2. **Build Data Pipeline** (02_data_pipeline.py): | ||||||
- Load and parse your documents | ||||||
- Create chunks for vector search | ||||||
- Build the vector search index | ||||||
|
||||||
3. **Create Agent** (03_agent_proof_of_concept.py): | ||||||
- Configure the agent with LLM and retriever settings | ||||||
- Deploy the agent to collect feedback | ||||||
|
||||||
4. **Evaluate and Improve** (04_create_evaluation_set.py, 05_evaluate_poc_quality.py): | ||||||
- Create evaluation sets from feedback | ||||||
- Measure quality metrics | ||||||
- Identify and fix quality issues | ||||||
|
||||||
## Key Components | ||||||
|
||||||
### Agent Configuration | ||||||
|
||||||
The agent is configured using the `AgentConfig` class in `agents/agent_config.py`. Key configuration includes: | ||||||
|
||||||
- Retriever tool settings (vector search, chunk formatting) | ||||||
- LLM configuration (model endpoint, system prompts) | ||||||
- Input examples for testing | ||||||
|
||||||
### Data Pipeline | ||||||
|
||||||
The data pipeline handles: | ||||||
|
||||||
- Document loading and parsing | ||||||
- Text chunking with configurable strategies | ||||||
- Vector index creation for retrieval | ||||||
|
||||||
### Quality Evaluation | ||||||
|
||||||
The evaluation framework provides: | ||||||
|
||||||
- Feedback collection through the Review App | ||||||
- Quality metrics computation | ||||||
- Root cause analysis of issues | ||||||
- Iterative quality improvements | ||||||
|
||||||
## Usage Example | ||||||
|
||||||
```python | ||||||
# 1. Configure your agent | ||||||
from agents.agent_config import AgentConfig, RetrieverToolConfig, LLMConfig | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not related to this PR, but i feel importing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💯 |
||||||
|
||||||
agent_config = AgentConfig( | ||||||
retriever_tool=RetrieverToolConfig(...), | ||||||
llm_config=LLMConfig(...), | ||||||
input_example={...} | ||||||
) | ||||||
|
||||||
# 2. Initialize and test the agent | ||||||
from agents.function_calling_agent_w_retriever_tool import AgentWithRetriever | ||||||
|
||||||
agent = AgentWithRetriever() | ||||||
response = agent.predict(model_input={"messages": [{"role": "user", "content": "What is RAG?"}]}) | ||||||
``` | ||||||
|
||||||
## Contributing | ||||||
|
||||||
1. Follow the [development setup](../dev/README.md) instructions | ||||||
2. Create a feature branch | ||||||
3. Add tests for new functionality | ||||||
4. Submit a pull request | ||||||
|
||||||
## Additional Resources | ||||||
|
||||||
- [Databricks Generative AI Cookbook](https://ai-cookbook.io/) | ||||||
- [Mosaic AI Documentation](https://docs.databricks.com/en/generative-ai/index.html) | ||||||
- [Vector Search Documentation](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we emphasize the 10 minute demo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdym by emphasize?