Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readme for agent app sample and main readme updates #56

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 161 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,165 @@
# Retrieval Augmented Generation
# Databricks Generative AI Cookbook

Please visit http://ai-cookbook.io for the accompanying documentation for this repo.
Please visit [ai-cookbook.io](http://ai-cookbook.io) for the accompanying documentation for this repository.

This repo provides [learning materials](https://ai-cookbook.io/) and [production-ready code](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code) to build a **high-quality RAG application** using Databricks. The [Mosaic Generative AI Cookbook](https://ai-cookbook.io/) provides:
- A conceptual overview and deep dive into various Generative AI design patterns, such as Prompt Engineering, Agents, RAG, and Fine Tuning
- An overview of Evaluation-Driven development
- The theory of every parameter/knob that impacts quality
- How to root cause quality issues and detemermine which knobs are relevant to experiment with for your use case
- Best practices for how to experiment with each knob
This repository provides [learning materials](https://ai-cookbook.io/) and code examples to build a **high-quality Generative AI application** using Databricks. The Cookbook provides:

The [provided code](https://github.com/databricks/genai-cookbook/tree/v0.2.0/agent_app_sample_code) is intended for use with the Databricks platform. Specifically:
- [Mosaic AI Agent Framework](https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html) which provides a fast developer workflow with enterprise-ready LLMops & governance
- [Mosaic AI Agent Evaluation](https://docs.databricks.com/en/generative-ai/agent-evaluation/index.html) which provides reliable, quality measurement using proprietary AI-assisted LLM judges to measure quality metrics that are powered by human feedback collected through an intuitive web-based chat UI
- A conceptual overview and deep dive into various Generative AI design patterns, such as Prompt Engineering, Agents, RAG, and Fine-Tuning.
- An overview of Evaluation-Driven Development.
- The theory of every parameter/knob that impacts quality.
- How to root cause quality issues and determine which knobs are relevant to experiment with for your use case.
- Best practices for how to experiment with each knob.

![Alt text](rag_app_sample_code/dbxquality.png)
## TL;DR:

Choose the recipe that best matches your needs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we emphasize the 10 minute demo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym by emphasize?


- **For RAG Applications:**
- [RAG Getting Started](./rag_app_sample_code/README.md)
- Start with `agent_app_sample_code/A_POC_app` to build a proof of concept
- Then explore `agent_app_sample_code/B_quality_iteration` to improve quality
- Uses Databricks' Mosaic AI Agent Framework for enterprise features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "enterprise features" mean here? can we be more explicit and say something like "agent serving" and "agent eval"


- **For an agent that uses a retriever tool:**
- Check out `agent_app_sample_code`

- **For OpenAI SDK Integration:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **For OpenAI SDK Integration:**
- **For Agent Application in pure Python + OpenAI:**

- [OpenAI SDK Getting Started](./openai_sdk_agent_app_sample_code/README.md)
- Navigate to `openai_sdk_agent_app_sample_code`
- Examples of building agents using OpenAI SDK with MLflow PyFunc Models

## Repository Structure

```
├── agent_app_sample_code/ # Sample code for agent applications
│ ├── agents/ # Agent code
│ ├── 03_agent_proof_of_concept.py # Example of a proof of concept agent
│ └── ... # Additional directories and files
├── openai_sdk_agent_app_sample_code/ # Sample code using OpenAI SDK
│ └── ... # Directories and files
├── rag_app_sample_code/ # Sample code for RAG applications
│ ├── A_POC_app/ # Proof-of-Concept applications
│ ├── pdf_uc_volume/ # Example of a RAG application using a PDFs
│ ├── B_quality_iteration/ # Code for quality iteration
│ └── ... # Additional directories and files
├── genai_cookbook/ # Documentation and learning materials
├── data/ # Sample data for testing and development
├── dev/ # Development tools and scripts
└── README.md # This README file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we drop this part? seems hard to maintain as we iterate on this repo

```

The `agent_app_sample_code` directory contains sample code for building agent applications using the Databricks platform.

The `openai_sdk_agent_app_sample_code` directory contains sample code that uses the OpenAI SDK + MLFlow PyFunc Models for building agents.

The `rag_app_sample_code` directory contains sample code for Retrieval-Augmented Generation (RAG) applications.

The `genai_cookbook` directory contains a 10 minute getting started guide.
Copy link

@jiayi-wu-3150 jiayi-wu-3150 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would dev and quick_start_demo got a word to say here, or they are good to go?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder which part this 10 minute demo has been simplified compared to agent_app_sample_code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev is for publishing this as a jupyter book, IMO we need to remove it from this repo


The provided code is intended for use with the Databricks platform. Specifically:
- [Mosaic AI Agent Framework](https://docs.databricks.com/generative-ai/agent-framework/build-genai-apps.html) which provides a fast developer workflow with enterprise-ready LLMops & governance
- [Mosaic AI Agent Evaluation](https://docs.databricks.com/generative-ai/agent-evaluation/index.html) which provides reliable, quality measurement using proprietary AI-assisted LLM judges to measure quality metrics that are powered by human feedback collected through an intuitive web-based chat UI

![Alt text](rag_app_sample_code/dbxquality.png)

## Getting Started
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth mentioning prereqs like UC


### Option 1: Running in a Databricks Workspace (Recommended)

1. **Clone the Repository into your Databricks Workspace:**
- In your Databricks workspace, go to Repos
- Click "Add Repo"
- Enter the Git repository URL: `https://github.com/databricks/genai-cookbook.git`

2. **Set Up Your Databricks Environment:**
- Use Serverless Notebooks or create a new cluster with Databricks Runtime 14.0 or higher

3. **Run the Sample Code:**
- Navigate to `agent_app_sample_code/A_POC_app` to start with a proof of concept
- Follow the numbered notebooks in sequence
- Each notebook contains detailed instructions and explanations

### Option 2: Running Locally
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fully supported? we dont recommend this in our docs anywhere, so just want to make sure.

Copy link

@jiayi-wu-3150 jiayi-wu-3150 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it means for running via VS Code but using Databricks' runtime? I feel a bit concerned as we are cloud platform and customers may feel confusing if we suggest run stuff locally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair - I think we need to document it, but it might not work consistently across all "recipes". We should probably let each one define their local development workflows if applicable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it means for running via VS Code but using Databricks' runtime? I feel a bit concerned as we are cloud platform and customers may feel confusing if we suggest run stuff locally.

Yes, this means running in VSCode with databricks connect.


1. **Prerequisites:**
- Python 3.10 or higher
- [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) installed and configured
- Git installed on your local machine

2. **Clone and Set Up:**
```bash
# Clone the repository
git clone https://github.com/databricks/genai-cookbook.git
cd genai-cookbook

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate

# Install dependencies
pip install -r dev/dev_requirements.txt
```

3. **Configure Databricks Connection:**
- Set up your Databricks CLI credentials:
```bash
databricks configure --token
```
- Follow the prompts to enter your Databricks workspace URL and access token

## Contributing

We welcome contributions to improve the cookbook! Here's how you can help:

### Development Setup

1. **Fork and Clone:**
- Fork the repository
- Clone the forked repository
```bash
git clone https://github.com/YOUR_USERNAME/genai-cookbook.git
cd genai-cookbook
```

### Making Changes

1. **Create a Feature Branch:**
```bash
git checkout -b feature/your-feature-name
```

2. **Update Documentation:**
- If you're adding new a new coobook directory, add a README.md file to the directory describing the new cookbook.

3. **Code Style:**
- Follow PEP 8 guidelines
- Include docstrings for new functions
- Add type hints where possible

### Submitting Changes

1. **Commit Your Changes:**
```bash
git add .
git commit -m "Description of your changes"
```

2. **Push to Your Fork:**
```bash
git push origin feature/your-feature-name
```

3. **Create a Pull Request:**
- Go to the [Pull Requests](https://github.com/databricks/genai-cookbook/pulls) page
- Click "New Pull Request"
- Select your fork and branch
- Fill out the PR template with:
- Description of changes
- Related issues
- Testing performed
- Screenshots/videos of manual testing (required)

### Getting Help

- For bugs or feature requests, [create an issue](https://github.com/databricks/genai-cookbook/issues)
- For questions, start a [GitHub Discussion](https://github.com/databricks/genai-cookbook/discussions)
Copy link

@jiayi-wu-3150 jiayi-wu-3150 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the link doesn't work. maybe remove or update the link?

112 changes: 112 additions & 0 deletions agent_app_sample_code/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Agent Application Sample Code

This directory contains sample code for building agent applications using client-side tools. The code demonstrates how to build, evaluate, and improve the quality of your agent applications.

## Directory Structure

```
├── agents/ # Agent implementation code
│ ├── agent_config.py # Configuration classes for the agent
│ ├── function_calling_agent_w_retriever_tool.py # Agent implementation with retriever tool
│ └── generated_configs/ # Generated agent configuration files
├── tests/ # Unit tests
├── utils/ # Utility functions and helpers
│ ├── build_retriever_index.py # Vector search index creation
│ ├── chunk_docs.py # Document chunking utilities
│ ├── eval_set_utilities.py # Evaluation set creation helpers
│ ├── file_loading.py # File loading utilities
│ └── typed_dicts_to_spark_schema.py # Schema conversion utilities
├── validators/ # Configuration validators
└── README.md # This file
```

## Getting Started

### Prerequisites

- Databricks Runtime 14.0 or higher
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Databricks Runtime 14.0 or higher
- Databricks Runtime 14.0 or higher or serverless

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if that could be done by purely Serverless. In the AI cookbook documentation, it mentions both https://ai-cookbook.io/nbs/6-implement-overview.html#requirements.

My understanding is the AI Cookbook includes many components and integrates various products, each with its own requirements. Some products, like Mosaic AI vector search, use Serverless runtime, while others rely on the DBR environment. This is why both DBR 14.3 (Non-ML) and Serverless runtime are mentioned. Not sure if that's correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should specify 'serverless notebook'

- A Databricks workspace with access to:
- [Mosaic AI Agent Framework](https://docs.databricks.com/en/generative-ai/agent-framework/build-genai-apps.html)
- [Mosaic AI Agent Evaluation](https://docs.databricks.com/en/generative-ai/agent-evaluation/index.html)
- [Vector Search](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html)
- [Model Serving](https://docs.databricks.com/en/machine-learning/model-serving/index.html)

### Setup Steps

1. **Configure Global Settings** (00_global_config.py):
- Set up Unity Catalog locations for your agent
- Configure MLflow experiment tracking
- Define evaluation settings

2. **Build Data Pipeline** (02_data_pipeline.py):
- Load and parse your documents
- Create chunks for vector search
- Build the vector search index

3. **Create Agent** (03_agent_proof_of_concept.py):
- Configure the agent with LLM and retriever settings
- Deploy the agent to collect feedback

4. **Evaluate and Improve** (04_create_evaluation_set.py, 05_evaluate_poc_quality.py):
- Create evaluation sets from feedback
- Measure quality metrics
- Identify and fix quality issues

## Key Components

### Agent Configuration

The agent is configured using the `AgentConfig` class in `agents/agent_config.py`. Key configuration includes:

- Retriever tool settings (vector search, chunk formatting)
- LLM configuration (model endpoint, system prompts)
- Input examples for testing

### Data Pipeline

The data pipeline handles:

- Document loading and parsing
- Text chunking with configurable strategies
- Vector index creation for retrieval

### Quality Evaluation

The evaluation framework provides:

- Feedback collection through the Review App
- Quality metrics computation
- Root cause analysis of issues
- Iterative quality improvements

## Usage Example

```python
# 1. Configure your agent
from agents.agent_config import AgentConfig, RetrieverToolConfig, LLMConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but i feel importing from agents... for cookbook util code, and then using from databricks_agents... for the actual SDK is confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯


agent_config = AgentConfig(
retriever_tool=RetrieverToolConfig(...),
llm_config=LLMConfig(...),
input_example={...}
)

# 2. Initialize and test the agent
from agents.function_calling_agent_w_retriever_tool import AgentWithRetriever

agent = AgentWithRetriever()
response = agent.predict(model_input={"messages": [{"role": "user", "content": "What is RAG?"}]})
```

## Contributing

1. Follow the [development setup](../dev/README.md) instructions
2. Create a feature branch
3. Add tests for new functionality
4. Submit a pull request

## Additional Resources

- [Databricks Generative AI Cookbook](https://ai-cookbook.io/)
- [Mosaic AI Documentation](https://docs.databricks.com/en/generative-ai/index.html)
- [Vector Search Documentation](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html)