Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readme for agent app sample and main readme updates #56

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 167 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,171 @@
# Retrieval Augmented Generation
# Databricks Generative AI Cookbook

Please visit http://ai-cookbook.io for the accompanying documentation for this repo.
Please visit [ai-cookbook.io](http://ai-cookbook.io) for the accompanying documentation for this repository.

This repo provides [learning materials](https://ai-cookbook.io/) and [production-ready code](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code) to build a **high-quality RAG application** using Databricks. The [Mosaic Generative AI Cookbook](https://ai-cookbook.io/) provides:
- A conceptual overview and deep dive into various Generative AI design patterns, such as Prompt Engineering, Agents, RAG, and Fine Tuning
- An overview of Evaluation-Driven development
- The theory of every parameter/knob that impacts quality
- How to root cause quality issues and detemermine which knobs are relevant to experiment with for your use case
- Best practices for how to experiment with each knob
This repository provides [learning materials](https://ai-cookbook.io/) and code examples to build a **high-quality Generative AI application** using Databricks. The Cookbook provides:

The [provided code](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code) is intended for use with the Databricks platform. Specifically:
- [Mosaic AI Agent Framework](https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html) which provides a fast developer workflow with enterprise-ready LLMops & governance
- [Mosaic AI Agent Evaluation](https://docs.databricks.com/en/generative-ai/agent-evaluation/index.html) which provides reliable, quality measurement using proprietary AI-assisted LLM judges to measure quality metrics that are powered by human feedback collected through an intuitive web-based chat UI
- A conceptual overview and deep dive into various Generative AI design patterns, such as Prompt Engineering, Agents, RAG, and Fine-Tuning.
- An overview of Evaluation-Driven Development.
- The theory of every parameter/knob that impacts quality.
- How to root cause quality issues and determine which knobs are relevant to experiment with for your use case.
- Best practices for how to experiment with each knob.

![Alt text](rag_app_sample_code/dbxquality.png)
## TL;DR:

This repository is a monorepo - each directory contains a standalone "recipe".

Choose the recipe that best matches your needs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we emphasize the 10 minute demo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym by emphasize?


- **For RAG Applications:**
- [RAG Getting Started](./rag_app_sample_code/README.md)
- Start with `agent_app_sample_code/A_POC_app` to build a proof of concept
- Then explore `agent_app_sample_code/B_quality_iteration` to improve quality
- Uses Databricks Agent Framework + Evaluation for enterprise features
- Ingest, process and automatically index documents with Spark + Databricks Vector Search
- Experiment and version RAG models with MLflow and Unity Catalog
- Autoscaling model deployment with Model Serving
- Iterate on quality with Agent Evals and SME Review UI
- Monitor real-time performance

- **For an agent that uses a retriever tool:**
- Check out `agent_app_sample_code`

- **For Agent Application in pure Python + OpenAI SDK:**
- [OpenAI SDK Getting Started](./openai_sdk_agent_app_sample_code/README.md)
- Navigate to `openai_sdk_agent_app_sample_code`
- Examples of building agents using OpenAI SDK with MLflow PyFunc Models

## How to use this repository

The provided code is intended for use with the Databricks platform. Specifically:
- [Mosaic AI Agent Framework](https://docs.databricks.com/generative-ai/agent-framework/build-genai-apps.html) which provides a fast developer workflow with enterprise-ready LLMops & governance
- [Mosaic AI Agent Evaluation](https://docs.databricks.com/generative-ai/agent-evaluation/index.html) which provides reliable, quality measurement using proprietary AI-assisted LLM judges to measure quality metrics that are powered by human feedback collected through an intuitive web-based chat UI

![Alt text](rag_app_sample_code/dbxquality.png)

Specific instructions for each recipe are provided in the README.md file of each subdirectory.

### Prerequisites

- Your Databricks workspace must have [Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/index.html) enabled.
- Your Databricks workspace must have [Model Serving](https://docs.databricks.com/machine-learning/model-serving/index.html#enable-model-serving-for-your-workspace) enabled.

### Option 1: Running in a Databricks Workspace (Recommended)

1. **Clone the Repository into your Databricks Workspace:**
- In your Databricks workspace, go to Repos
- Click "Add Repo"
- Enter the Git repository URL: `https://github.com/databricks/genai-cookbook.git`
- After completing the steps above, use sparse checkout mode to clone the subdirectory of your choice.

1a. **Optional: Download the repository as a zip file:**
In cases where you cannot use Git folders, you can download the repository as a zip file.
- Click the "Download ZIP" button on the repository page
- Unzip the file and upload it to your Databricks workspace

2. **Set Up Your Databricks Environment:**
- Use Serverless Notebooks or create a new cluster with Databricks Runtime 14.0 or higher

3. **Run the Sample Code:**
- Navigate to `agent_app_sample_code/A_POC_app` to start with a proof of concept
- Follow the numbered notebooks in sequence
- Each notebook contains detailed instructions and explanations

### Option 2: Running Locally
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fully supported? we dont recommend this in our docs anywhere, so just want to make sure.

Copy link

@jiayi-wu-3150 jiayi-wu-3150 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it means for running via VS Code but using Databricks' runtime? I feel a bit concerned as we are cloud platform and customers may feel confusing if we suggest run stuff locally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair - I think we need to document it, but it might not work consistently across all "recipes". We should probably let each one define their local development workflows if applicable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it means for running via VS Code but using Databricks' runtime? I feel a bit concerned as we are cloud platform and customers may feel confusing if we suggest run stuff locally.

Yes, this means running in VSCode with databricks connect.


If you prefer to edit code locally, you can use Databricks Connect and optionally an IDE plugin like [VSCode](https://docs.databricks.com/en/dev-tools/vscode-ext/index.html).

It is strongly recommended to first read the [Databricks Connect documentation](https://docs.databricks.com/en/
dev-tools/databricks-connect/index.html) to understand how to connect your local machine to your Databricks
workspace.

1. **Install Prerequisites**
- Python 3.10 or higher
- [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
- Git
- Optional: [VSCode with Databricks Extension](https://docs.databricks.com/en/dev-tools/vscode-ext/index.html)

2. **Set Up Databricks Connect**
- Read the [Databricks Connect documentation](https://docs.databricks.com/en/dev-tools/databricks-connect/index.html)
- Follow the setup instructions to connect your local machine to your workspace

3. **Configure MLflow**
Set the following environment variables:
```bash
export MLFLOW_TRACKING_URI=databricks
export DATABRICKS_HOST=<your-workspace-url>
export DATABRICKS_TOKEN=<your-access-token>
```

4. **Clone and Set Up the Repository**
```bash
git clone https://github.com/databricks/genai-cookbook.git
cd genai-cookbook
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
pip install -r dev/dev_requirements.txt
```

5. **Configure Databricks CLI**
```bash
databricks configure --token
```
When prompted, enter your:
- Workspace URL
- Access token

## Contributing

We welcome contributions to improve the cookbook! Here's how you can help:

### Development Setup

1. **Fork and Clone:**
- Fork the repository
- Clone the forked repository
```bash
git clone https://github.com/YOUR_USERNAME/genai-cookbook.git
cd genai-cookbook
```

### Making Changes

1. **Create a Feature Branch:**
```bash
git checkout -b feature/your-feature-name
```

2. **Update Documentation:**
- If you're adding new a new coobook directory, add a README.md file to the directory describing the new cookbook.

3. **Code Style:**
- Follow PEP 8 guidelines
- Include docstrings for new functions
- Add type hints where possible

### Submitting Changes

1. **Commit Your Changes:**
```bash
git add .
git commit -m "Description of your changes"
```

2. **Push to Your Fork:**
```bash
git push origin feature/your-feature-name
```

3. **Create a Pull Request:**
- Go to the [Pull Requests](https://github.com/databricks/genai-cookbook/pulls) page
- Click "New Pull Request"
- Select your fork and branch
- Fill out the PR template with:
- Description of changes
- Related issues
- Testing performed
- Screenshots/videos of manual testing (required)

### Getting Help

- For bugs or feature requests, or questions, [create an issue](https://github.com/databricks/genai-cookbook/issues)
112 changes: 112 additions & 0 deletions agent_app_sample_code/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Agent Application Sample Code

This directory contains sample code for building agent applications using client-side tools. The code demonstrates how to build, evaluate, and improve the quality of your agent applications.

## Directory Structure

```
├── agents/ # Agent implementation code
│ ├── agent_config.py # Configuration classes for the agent
│ ├── function_calling_agent_w_retriever_tool.py # Agent implementation with retriever tool
│ └── generated_configs/ # Generated agent configuration files
├── tests/ # Unit tests
├── utils/ # Utility functions and helpers
│ ├── build_retriever_index.py # Vector search index creation
│ ├── chunk_docs.py # Document chunking utilities
│ ├── eval_set_utilities.py # Evaluation set creation helpers
│ ├── file_loading.py # File loading utilities
│ └── typed_dicts_to_spark_schema.py # Schema conversion utilities
├── validators/ # Configuration validators
└── README.md # This file
```

## Getting Started

### Prerequisites

- Databricks Runtime 14.0 or higher, or Serverless
- A Databricks workspace with access to:
- [Mosaic AI Agent Framework](https://docs.databricks.com/en/generative-ai/agent-framework/build-genai-apps.html)
- [Mosaic AI Agent Evaluation](https://docs.databricks.com/en/generative-ai/agent-evaluation/index.html)
- [Vector Search](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html)
- [Model Serving](https://docs.databricks.com/en/machine-learning/model-serving/index.html)

### Setup Steps

1. **Configure Global Settings** (00_global_config.py):
- Set up Unity Catalog locations for your agent
- Configure MLflow experiment tracking
- Define evaluation settings

2. **Build Data Pipeline** (02_data_pipeline.py):
- Load and parse your documents
- Create chunks for vector search
- Build the vector search index

3. **Create Agent** (03_agent_proof_of_concept.py):
- Configure the agent with LLM and retriever settings
- Deploy the agent to collect feedback

4. **Evaluate and Improve** (04_create_evaluation_set.py, 05_evaluate_poc_quality.py):
- Create evaluation sets from feedback
- Measure quality metrics
- Identify and fix quality issues

## Key Components

### Agent Configuration

The agent is configured using the `AgentConfig` class in `agents/agent_config.py`. Key configuration includes:

- Retriever tool settings (vector search, chunk formatting)
- LLM configuration (model endpoint, system prompts)
- Input examples for testing

### Data Pipeline

The data pipeline handles:

- Document loading and parsing
- Text chunking with configurable strategies
- Vector index creation for retrieval

### Quality Evaluation

The evaluation framework provides:

- Feedback collection through the Review App
- Quality metrics computation
- Root cause analysis of issues
- Iterative quality improvements

## Usage Example

```python
# 1. Configure your agent
from agents.agent_config import AgentConfig, RetrieverToolConfig, LLMConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but i feel importing from agents... for cookbook util code, and then using from databricks_agents... for the actual SDK is confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯


agent_config = AgentConfig(
retriever_tool=RetrieverToolConfig(...),
llm_config=LLMConfig(...),
input_example={...}
)

# 2. Initialize and test the agent
from agents.function_calling_agent_w_retriever_tool import AgentWithRetriever

agent = AgentWithRetriever()
response = agent.predict(model_input={"messages": [{"role": "user", "content": "What is RAG?"}]})
```

## Contributing

1. Follow the [development setup](../dev/README.md) instructions
2. Create a feature branch
3. Add tests for new functionality
4. Submit a pull request

## Additional Resources

- [Databricks Generative AI Cookbook](https://ai-cookbook.io/)
- [Mosaic AI Documentation](https://docs.databricks.com/en/generative-ai/index.html)
- [Vector Search Documentation](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html)