diff --git a/docs/_get_started/backend/Backend_Setup.md b/docs/_get_started/backend/Backend_Setup.md index f202cf6dc..910d3f7fa 100644 --- a/docs/_get_started/backend/Backend_Setup.md +++ b/docs/_get_started/backend/Backend_Setup.md @@ -5,164 +5,561 @@ parent: Backend nav_order: 1 --- -# Backend Setup πŸš€ +# Omi Backend Setup Guide πŸš€ -Welcome to the Omi backend setup guide! Omi is an innovative, multimodal AI assistant that combines cutting-edge technologies to provide a seamless user experience. This guide will help you set up the -backend infrastructure that powers Omi's intelligent capabilities. +Welcome to the in-depth Omi backend setup guide! This document provides a comprehensive walkthrough for setting up and running the Omi backend, which powers the intelligent capabilities of our multimodal AI assistant. Whether you're a seasoned developer or new to the project, this guide will help you get the backend up and running smoothly. + +## Table of Contents + +1. [Prerequisites](#prerequisites-) +2. [Setting Up Google Cloud & Firebase](#i-setting-up-google-cloud--firebase-) +3. [Backend Setup](#ii-backend-setup-) +4. [Running the Backend Locally](#iii-running-the-backend-locally-) +5. [Environment Variables](#environment-variables-) +6. [Modal Serverless Deployment](#modal-serverless-deployment-) +7. [Comprehensive Troubleshooting Guide](#comprehensive-troubleshooting-guide-) +8. [Performance Optimization](#performance-optimization-) +9. [Security Considerations](#security-considerations-) +10. [Contributing](#contributing-) +11. [Support](#support-) ## Prerequisites πŸ“‹ -Before you start, make sure you have the following: +Before you begin, ensure you have the following: + +- **Google Cloud Project:** With Firebase enabled. If you've set up Firebase for the Omi app, you already have this. + +- **API Keys:** + + - **Required API Keys:** + - **OpenAI:** [platform.openai.com](https://platform.openai.com/) - For language models and embeddings + - **Deepgram:** [deepgram.com](https://deepgram.com/) - For real-time speech-to-text + - **Redis:** Upstash recommended [upstash.com](https://upstash.com/) - For caching and temporary data storage + - **Pinecone:** Use "text-embedding-ada-002" model [pinecone.io](https://www.pinecone.io/) - For vector database operations + - **Hugging Face:** [huggingface.co](https://huggingface.co/) - For voice activity detection models -- **Google Cloud Project:** You need a Google Cloud project with Firebase enabled. If you've already set up Firebase for the Omi app, you're good to go. -- **API Keys: πŸ”‘** Obtain API keys for: - - **OpenAI:** For AI language models ([platform.openai.com](https://platform.openai.com/)) - - **Deepgram:** For speech-to-text ([deepgram.com](https://deepgram.com/)) - - **Redis:** Upstash is recommended ([upstash.com](https://upstash.com/)) - - **Pinecone:** For vector database; use "text-embedding-3-large" model ([pinecone.io](https://www.pinecone.io/)) - - **Modal: [optional]** For serverless deployment ([modal.com](https://modal.com/)) - - **Hugging Face:** For voice activity detection ([huggingface.co](https://huggingface.co/)) - - **GitHub:[optional]** For firmware updates ([github.com](https://github.com/)) -- **Google Maps API Key:** πŸ—ΊοΈ (Optional) For location features + - **Optional API Keys:** + - **Modal:** [modal.com](https://modal.com/) - For serverless deployment + - **GitHub:** [github.com](https://github.com/) - For firmware updates + - **Hume AI:** [hume.ai](https://hume.ai/) - For emotional analysis (optional) + - **Google Maps API Key:** πŸ—ΊοΈ For location features + +- **Development Environment:** + - **Python 3.9 or higher** (Python 3.11 recommended) + - **pip** (latest version) + - **git** + - **ffmpeg** (for audio processing) + - **Ngrok** (for tunneling localhost) + - **A code editor** (e.g., VSCode, PyCharm) + +- **Installation Guides:** + - [Python Installation Guide](https://www.python.org/downloads/) + - [ffmpeg Installation Guide](https://ffmpeg.org/download.html) + - [git Installation Guide](https://git-scm.com/downloads) + - [Ngrok Installation Guide](https://ngrok.com/download) ## I. Setting Up Google Cloud & Firebase ☁️ 1. **Install Google Cloud SDK:** - - **Mac (using brew):** `brew install google-cloud-sdk` - - **Nix Envdir:** The SDK is usually pre-installed - -2. **Enable Necessary APIs: πŸ”§** - - Go to the [Google Cloud Console](https://console.cloud.google.com/) - - Select your project - - Navigate to APIs & Services -> Library - - Enable the following APIs: - - Cloud Resource Manager API - - Firebase Management API - -3. **Authenticate with Google Cloud: πŸ”** - - Open your terminal - - Run the following commands one by one, replacing `` with your Google Cloud project ID: - ```bash - gcloud auth login - gcloud config set project - gcloud auth application-default login --project - ``` - - This process generates an `application_default_credentials.json` file in the `~/.config/gcloud` directory. This file is used for automatic authentication with Google Cloud services in Python. + + - **macOS (using Homebrew):** + + ```bash + brew install google-cloud-sdk + ``` + + - **Ubuntu/Debian:** + + ```bash + sudo apt-get install apt-transport-https ca-certificates gnupg + echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] \ + https://packages.cloud.google.com/apt cloud-sdk main" | \ + sudo tee /etc/apt/sources.list.d/google-cloud-sdk.list + sudo apt-get update && sudo apt-get install google-cloud-sdk + ``` + + - **Windows:** + - Download and install from the [official Google Cloud SDK installation guide](https://cloud.google.com/sdk/docs/install#windows). + +2. **Enable Necessary APIs:** + + - Navigate to the [Google Cloud Console](https://console.cloud.google.com/). + - Select your project. + - Go to **APIs & Services** -> **Library**. + - Search for and enable these APIs: + - **Cloud Resource Manager API** + - **Firebase Management API** + - **Cloud Storage API** + - **Cloud Firestore API** + +3. **Authenticate with Google Cloud:** + + ```bash + gcloud auth login + gcloud config set project + gcloud auth application-default login + ``` + + - Replace `` with your actual Google Cloud project ID. + - This generates `application_default_credentials.json` in `~/.config/gcloud` (macOS/Linux) or `%APPDATA%\gcloud` (Windows). + + **Note:** If you encounter any permission issues, ensure your Google account has the necessary roles (e.g., **Project Owner**, **Firebase Admin**) in the Google Cloud Console. + + - To assign roles: + - Go to **IAM & Admin** -> **IAM**. + - Locate your account and ensure it has the required permissions. ## II. Backend Setup πŸ› οΈ -1. **Install Python & Dependencies: 🐍** - - **Mac (using brew):** `brew install python` - - **Nix Envdir:** Python is pre-installed - - **Install pip (if not present):** - - **Mac:** Use `easy_install pip` - - **Other Systems:** Follow instructions on [https://pip.pypa.io/en/stable/installation/](https://pip.pypa.io/en/stable/installation/) - - **Install Git and FFmpeg:** - - **Mac (using brew):** `brew install git ffmpeg` - - **Nix Envdir:** Git and FFmpeg are pre-installed - -2. **Clone the Backend Repository: πŸ“‚** - - Open your terminal and navigate to your desired directory - - Clone the Omi backend repository: - ```bash - git clone https://github.com/BasedHardware/Omi.git - cd Omi - cd backend - ``` - -3. **Set up the Environment File: πŸ“** - - Create a copy of the `.env.template` file and rename it to `.env`: - ```bash - cp .env.template .env - ``` - - Open the `.env` file and fill in the following: - - **OpenAI API Key:** Obtained from your OpenAI account - - **Deepgram API Key:** Obtained from your Deepgram account - - **Redis Credentials:** Host, port, username, and password for your Redis instance - - **Modal API Key:** Obtained from your Modal account - - **`ADMIN_KEY`:** Set to a temporary value (e.g., `123`) for local development - - **Other API Keys:** Fill in any other API keys required by your integrations (e.g., Google Maps API key) - -4. **Install Python Dependencies: πŸ“š** - - In your terminal (inside the backend directory), run: - ```bash - pip install -r requirements.txt - ``` +1. **Install Python & Dependencies:** + + - **macOS (using Homebrew):** + + ```bash + brew install python@3.11 git ffmpeg + ``` + + - **Ubuntu/Debian:** + + ```bash + sudo apt-get update + sudo apt-get install python3.11 python3-pip git ffmpeg + ``` + + - **Windows:** + + - Download and install: + - [Python Installer](https://www.python.org/downloads/windows/) + - Ensure you check **Add Python to PATH** during installation. + - [Git for Windows](https://gitforwindows.org/) + - [FFmpeg Builds](https://www.gyan.dev/ffmpeg/builds/) + + **Verify Installations:** + + ```bash + python --version + git --version + ffmpeg -version + ``` + +2. **Clone the Backend Repository:** + + ```bash + git clone https://github.com/BasedHardware/Omi.git + cd Omi/backend + ``` + +3. **Set up a Virtual Environment (Recommended):** + + ```bash + # Create a virtual environment + python3 -m venv omi_env + + # Activate the virtual environment + # On macOS/Linux: + source omi_env/bin/activate + # On Windows: + omi_env\Scripts\activate + ``` + +4. **Set up the Environment File:** + + ```bash + # Copy the template and edit the .env file + cp .env.template .env + # Use a text editor to fill in your API keys and settings + nano .env # Or your preferred editor + ``` + + **Important:** Never commit your `.env` file to version control. It's added to `.gitignore` by default. + + - **Security Reminder:** + - Keep your API keys and secrets secure. + - Consider using tools like `dotenv` to manage environment variables. + +5. **Install Python Dependencies:** + + ```bash + pip install --upgrade pip + pip install -r requirements.txt + ``` + + - If you encounter issues, try: + + ```bash + pip install -r requirements.txt --no-cache-dir + ``` + + - **Troubleshooting:** + - Install dependencies individually to identify any problematic packages. ## III. Running the Backend Locally πŸƒβ€β™‚οΈ -1. **Set up Ngrok for Tunneling: πŸš‡** - - Sign up for a free account on [https://ngrok.com/](https://ngrok.com/) and install Ngrok - - Follow their instructions to authenticate Ngrok with your account - - During the onboarding, Ngrok will provide you with a command to create a tunnel to your localhost. Modify the port in the command to `8000` (the default port for the backend). For example: - ```bash - ngrok http --domain=example.ngrok-free.app 8000 - ``` - - Run this command in your terminal. Ngrok will provide you with a public URL (like `https://example.ngrok-free.app`) that points to your local backend - -2. **Start the Backend Server: πŸ–₯️** - - In your terminal, run: - ```bash - uvicorn main:app --reload --env-file .env - ``` - - `--reload` automatically restarts the server when code changes are saved, making development easier - - `--env-file .env` loads environment variables from your `.env` file - - `--host 0.0.0.0` listens to every interface on your computer so you don't have to set up `ngrok` when developing in your network - - `--port 8000` port for backend to listen - -3. **Troubleshooting SSL Errors: πŸ”’** - - **SSL Errors:** If you encounter SSL certificate errors during model downloads, add this to `utils/stt/vad.py`: - ```python - import ssl - ssl._create_default_https_context = ssl._create_unverified_context - ``` - - **API Key Issues:** Double-check all API keys in your `.env` file. Ensure there are no trailing spaces - - **Ngrok Connection:** Ensure your Ngrok tunnel is active and the URL is correctly set in the Omi app - - **Dependencies:** If you encounter any module not found errors, try reinstalling dependencies: - ```bash - pip install -r requirements.txt --upgrade --force-reinstall - ``` - -4. **Connect the App to the Backend: πŸ”—** - - In your Omi app's environment variables, set the `API_BASE_URL` to the public URL provided by Ngrok (e.g., `https://example.ngrok-free.app`) - -Now, your Omi app should be successfully connected to the locally running backend. +1. **Set up Ngrok for Tunneling:** + + - Sign up at [ngrok.com](https://ngrok.com/) and install Ngrok. + - Authenticate Ngrok with your account: + + ```bash + ngrok authtoken + ``` + + - Start an Ngrok tunnel to your localhost: + + ```bash + ngrok http 8000 + ``` + + **Note:** For custom domains using the `--domain` flag, a paid Ngrok plan is required. + +2. **Start the Backend Server:** + + ```bash + uvicorn main:app --reload --env-file .env --host 0.0.0.0 --port 8000 + ``` + + - `--reload`: Automatically restarts the server when code changes are detected. + - `--env-file .env`: Loads environment variables from the `.env` file. + - `--host 0.0.0.0`: Allows external access to the server. + - `--port 8000`: Specifies the port to run the server on. + +3. **Verify the Server:** + + - Open a web browser and navigate to `http://localhost:8000/docs`. + - Alternatively, use the Ngrok URL provided (e.g., `https://.ngrok.io/docs`). + - You should see the Swagger UI documentation for the API. + +4. **Connect the App to the Backend:** + + - In your Omi app's configuration, set `API_BASE_URL` to the Ngrok URL: + + ```env + API_BASE_URL=https://.ngrok.io + ``` + + - Replace `` with the forwarding address displayed by Ngrok. ## Environment Variables πŸ” -Here's a detailed explanation of each environment variable you need to define in your `.env` file: - -- **`HUGGINGFACE_TOKEN`:** Your Hugging Face Hub API token, used to download models for speech processing (like voice activity detection) -- **`BUCKET_SPEECH_PROFILES`:** The name of the Google Cloud Storage bucket where user speech profiles are stored -- **`BUCKET_BACKUPS`:** The name of the Google Cloud Storage bucket used for backups (if applicable) -- **`GOOGLE_APPLICATION_CREDENTIALS`:** The path to your Google Cloud service account credentials file (`google-credentials.json`). This file is generated in step 3 of **I. Setting Up Google Cloud & - Firebase** -- **`PINECONE_API_KEY`:** Your Pinecone API key, used for vector database operations. Storing Memory Embeddings: Each memory is converted into a numerical representation (embedding). Pinecone - efficiently stores these embeddings and allows Omi to quickly find the most relevant memories related to a user's query -- **`PINECONE_INDEX_NAME`:** The name of your Pinecone index where memory embeddings are stored -- **`REDIS_DB_HOST`:** The host address of your Redis instance -- **`REDIS_DB_PORT`:** The port number of your Redis instance -- **`REDIS_DB_PASSWORD`:** The password for your Redis instance -- **`DEEPGRAM_API_KEY`:** Your Deepgram API key, used for real-time and pre-recorded audio transcription -- **`ADMIN_KEY`:** A temporary key used for authentication during local development (replace with a more secure method in production) -- **`OPENAI_API_KEY`:** Your OpenAI API key, used for accessing OpenAI's language models for chat, memory processing, and more -- **`GITHUB_TOKEN`:** Your GitHub personal access token, used to access GitHub's API for retrieving the latest firmware version -- **`WORKFLOW_API_KEY`:** Your custom API key for securing communication with external workflows or integrations - -Make sure to replace the placeholders (``, ``, etc.) with your actual values. +Detailed explanation of each variable in your `.env` file: + +- `HUGGINGFACE_TOKEN`: Your Hugging Face API token for downloading speech processing models. +- `BUCKET_SPEECH_PROFILES`: Name of the Google Cloud Storage bucket for storing user speech profiles. +- `BUCKET_BACKUPS`: Name of the Google Cloud Storage bucket for backups (if applicable). +- `GOOGLE_APPLICATION_CREDENTIALS`: Full path to your Google Cloud credentials JSON file. + + - Example paths: + - macOS/Linux: `/Users/yourname/.config/gcloud/application_default_credentials.json` + - Windows: `C:\Users\yourname\AppData\Roaming\gcloud\application_default_credentials.json` + +- `PINECONE_API_KEY`: Your Pinecone API key for vector database operations. +- `PINECONE_INDEX_NAME`: Name of your Pinecone index (create this in the Pinecone console). +- `REDIS_DB_HOST`: Hostname of your Redis instance (e.g., `redis-12345.c56.us-east-1-3.ec2.cloud.redislabs.com`). +- `REDIS_DB_PORT`: Port number for your Redis instance (usually 6379). +- `REDIS_DB_PASSWORD`: Password for your Redis instance. +- `DEEPGRAM_API_KEY`: Your Deepgram API key for real-time and pre-recorded audio transcription. +- `ADMIN_KEY`: A secure key for admin-level API access (generate a strong, random string). +- `OPENAI_API_KEY`: Your OpenAI API key for accessing language models and embeddings. +- `GITHUB_TOKEN`: Your GitHub personal access token (if using GitHub for firmware updates). +- `WORKFLOW_API_KEY`: Custom API key for securing communication with external workflows. +- `HUME_API_KEY`: Your Hume AI API key for emotional analysis features (if enabled). + +**Important:** Never commit your `.env` file to version control. Ensure it's listed in your `.gitignore`: + +```gitignore +# .gitignore + +# Ignore environment files +.env +*.env + +# Ignore virtual environments +venv/ +omi_env/ + +# Ignore Python cache files +__pycache__/ +*.pyc +``` + +## Modal Serverless Deployment πŸš€ + +For deploying the backend using Modal: + +1. **Install Modal:** + + ```bash + pip install modal + ``` + +2. **Set up Modal Secrets:** + + - Use Modal's CLI or dashboard to create secrets for your environment variables. + + ```bash + # Create secret for Google Cloud credentials + modal secret create gcp-credentials --from-file application_default_credentials.json + + # Create secret for environment variables + modal secret create envs --from-env-file .env + ``` + + - Ensure you securely store all necessary credentials and environment variables. + +3. **Prepare for Deployment:** + + - Update your `main.py` to include Modal configurations: + + ```python + # main.py + import modal + + stub = modal.Stub("omi-backend") + + image = modal.Image.debian_slim().pip_install_from_requirements("requirements.txt") + + @stub.function(image=image, secrets=[modal.Secret.from_name("gcp-credentials"), modal.Secret.from_name("envs")]) + @modal.asgi_app() + def fastapi_app(): + from main import app + return app + ``` + + - Ensure that `main.py` properly imports your FastAPI app and any necessary modules. + +4. **Deploy to Modal:** + + ```bash + modal deploy main.py + ``` + +5. **Verify Deployment:** + + - Modal will provide a URL for your deployed app. + - Visit `https:///docs` to ensure the API is accessible. + - Update your Omi app's `API_BASE_URL` to point to the Modal URL. + +## Comprehensive Troubleshooting Guide πŸ”§ + +### Common Issues and Solutions: + +#### 1. SSL Certificate Errors + +If you encounter SSL certificate errors when downloading models: + +- **Temporary Workaround:** + + ```python + # Add at the top of your script + import ssl + ssl._create_default_https_context = ssl._create_unverified_context + ``` + +- **Permanent Solution:** + + - Update your SSL certificates or configure your environment to trust the necessary certificates. + - Avoid disabling SSL verification in production environments due to security risks. + +#### 2. API Key Issues + +- **Steps to Resolve:** + + - Double-check all API keys in your `.env` file for accuracy. + - Ensure there are no extra spaces or hidden characters. + - Confirm that your API keys have the necessary permissions and are active. + +#### 3. Ngrok Connection Problems + +- **Troubleshooting Tips:** + + - Ensure Ngrok is running and the tunnel is active. + - Verify the Ngrok URL is correctly set in the Omi app's `API_BASE_URL`. + - Check Ngrok's console for any error messages or warnings. + - Ensure your firewall allows traffic on the required ports. + +#### 4. Dependency Installation Failures + +- **Possible Solutions:** + + - Upgrade pip: + + ```bash + pip install --upgrade pip + ``` + + - Install dependencies without cache: + + ```bash + pip install -r requirements.txt --no-cache-dir + ``` + + - Install dependencies one by one to identify the problematic package. + +#### 5. Database Connection Errors + +- **Firestore:** + + - Verify that Firestore is enabled in your Google Cloud project. + - Ensure your service account has the **Cloud Datastore User** role. + +- **Redis:** + + - Check your Redis connection settings. + - Ensure your IP is whitelisted in Redis (if using Upstash). + +#### 6. "Module Not Found" Errors + +- **Solutions:** + + - Ensure you're in the correct virtual environment. + - Reinstall the missing module: + + ```bash + pip install + ``` + +#### 7. Performance Issues + +- **Recommendations:** + + - Monitor your API usage and quotas for third-party services. + - Implement caching strategies for frequently accessed data. + - Use profiling tools to identify bottlenecks in your code. + +#### 8. Firewall and Network Issues + +- **Check:** + + - Ensure your firewall allows traffic on ports 8000 (backend) and the Ngrok assigned port. + - Verify network settings if running in a corporate or restricted environment. + +#### 9. Python Version Conflicts + +- **Resolution:** + + - Ensure you're using the correct Python version (3.9 or higher). + - Use `python3` explicitly if multiple versions are installed. + +## Performance Optimization πŸš€ + +1. **Caching Strategy:** + + - Implement Redis caching for frequently accessed data. + - Example using `redis`: + + ```python + import redis + + r = redis.Redis(host='localhost', port=6379, db=0) + + @app.get("/data") + async def get_data(): + cached_data = r.get("data_key") + if cached_data: + return json.loads(cached_data) + data = await fetch_data_from_db() + r.set("data_key", json.dumps(data), ex=3600) # Cache for 1 hour + return data + ``` + +2. **Asynchronous Operations:** + + - Utilize FastAPI's async features for I/O-bound operations. + - Use `asyncio` for concurrent tasks. + +3. **Database Optimization:** + + - Design efficient Firestore queries and proper indexing. + - Use batch operations for multiple writes. + +4. **API Rate Limiting:** + + - Implement rate limiting to prevent abuse using middleware or extensions like `slowapi`. + + ```python + from slowapi import Limiter, _rate_limit_exceeded_handler + from slowapi.util import get_remote_address + + limiter = Limiter(key_func=get_remote_address) + app.state.limiter = limiter + app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler) + + @app.get("/endpoint") + @limiter.limit("5/minute") + async def limited_endpoint(): + return {"message": "This endpoint is rate limited."} + ``` + +5. **Monitoring and Profiling:** + + - Integrate tools like **Prometheus** and **Grafana** for monitoring. + - Use Python profiling tools like `cProfile` or third-party services like **New Relic**. + +## Security Considerations πŸ”’ + +1. **API Security:** + + - Implement authentication and authorization mechanisms (e.g., OAuth 2.0, JWT). + - Use HTTPS for all communications. + - Regularly rotate API keys and secrets. + +2. **Data Protection:** + + - Encrypt sensitive data at rest and in transit. + - Apply proper access controls in Firestore and Google Cloud Storage. + - Regularly back up data and test restoration procedures. + +3. **Dependency Management:** + + - Regularly update dependencies to patch security vulnerabilities. + - Use tools like `safety` and `bandit` to check for known vulnerabilities: + + ```bash + pip install safety bandit + safety check + bandit -r . + ``` + +4. **Environment Isolation:** + + - Use separate environments for development, staging, and production. + - Apply appropriate access controls and configurations for each environment. + +5. **Audit Logging:** + + - Implement comprehensive logging for security-relevant events. + - Use centralized logging solutions for analysis (e.g., ELK Stack). + +6. **Data Compliance:** + + - Be aware of data protection laws (e.g., GDPR, CCPA). + - Implement user consent mechanisms and data handling policies. ## Contributing 🀝 -We welcome contributions from the open source community! Whether it's improving documentation, adding new features, or reporting bugs, your input is valuable. Check out -our [Contribution Guide](https://docs.omi.me/developer/Contribution/) for more information. +We welcome contributions! Check our [Contribution Guide](https://docs.omi.me/developer/Contribution/) for details on: + +- Setting up a development environment +- Coding standards and best practices (e.g., following [PEP 8](https://www.python.org/dev/peps/pep-0008/) for Python code) +- Writing tests to ensure code quality +- Pull request process +- Code review guidelines + +**Note:** Ensure the link to the contribution guide is correct and accessible. ## Support πŸ†˜ -If you're stuck, have questions, or just want to chat about Omi: +If you need help: + +- **GitHub Issues:** πŸ› For bug reports and feature requests +- **Community Forum:** πŸ’¬ Join our [Discord community](https://discord.gg/ZutWMTJnwA) +- **Documentation:** πŸ“š Visit our [full documentation](https://docs.omi.me/) +- **FAQ:** Check our [Frequently Asked Questions](https://docs.omi.me/faq/) section in the docs +- **Email Support:** βœ‰οΈ Contact us at [support@omi.me](mailto:support@omi.me) -- **GitHub Issues: πŸ›** For bug reports and feature requests -- **Community Forum: πŸ’¬** Join our [community forum](https://discord.gg/ZutWMTJnwA) for discussions and questions -- **Documentation: πŸ“š** Check out our [full documentation](https://docs.omi.me/) for in-depth guides +Remember, when seeking help, provide as much relevant information as possible, including error messages, logs, and steps to reproduce the issue. + +--- -Happy coding! πŸ’» If you have any questions or need further assistance, don't hesitate to reach out to our community. +Happy coding! πŸ’» Don't hesitate to reach out if you need assistance. The Omi community is here to help you succeed in building amazing AI-powered experiences. diff --git a/docs/_get_started/backend/StoringMemory.md b/docs/_get_started/backend/StoringMemory.md index 4e11b2986..3387eb11b 100644 --- a/docs/_get_started/backend/StoringMemory.md +++ b/docs/_get_started/backend/StoringMemory.md @@ -1,100 +1,238 @@ --- layout: default -title: Memeory Store +title: Memory Storage parent: Backend nav_order: 3 --- -# πŸ“š Memory Storage Process +# πŸ“š Guide to Omi's Memory Storage Process + +This document provides an in-depth look at how Omi stores and manages memory objects, a crucial component of its intelligent AI assistant capabilities. + +## πŸ”„ Overview of the Memory Storage Process + +1. Memory object creation and initial processing +2. Conversion of memory object to a structured dictionary +3. Data organization into specific fields +4. Storage in Firebase Firestore +5. Vector embedding generation and storage in Pinecone +6. Optional post-processing and updates + +![Backend Memory Storage](/images/memorystore.png) + +## 🧠 Detailed Steps in Memory Storage + +### 1. πŸ“₯ Memory Object Creation and Initial Processing + +The journey of a memory begins with its creation, typically triggered by a user interaction such as a conversation or an OpenGlass capture. + +```python +# In utils/memories/process_memory.py +async def process_memory(uid: str, processing_memory_id: str): + # Retrieve the processing memory + processing_memory = get_processing_memory_by_id(uid, processing_memory_id) + + # Extract structured data using OpenAI's LLM + structured_data = await extract_structured_data(processing_memory.transcript) + + # Generate initial embedding + embedding = generate_memory_embedding(processing_memory) + + # Create the memory object + memory_data = { + "id": str(uuid.uuid4()), + "created_at": processing_memory.created_at, + "transcript_segments": processing_memory.transcript_segments, + "structured": structured_data, + # ... other memory fields + } + + # Store the memory and its embedding + upsert_memory(uid, memory_data) + upsert_vector(uid, memory_data, embedding) + + # Clean up the processing memory + delete_processing_memory(uid, processing_memory_id) +``` -This document outlines the process of storing memory objects in the Friend AI system. +### 2. πŸ”„ Conversion to Structured Dictionary -## πŸ”„ Overview of the Process +The memory object is converted into a structured Python dictionary. This step is crucial as it prepares the data for storage in Firestore, which uses a JSON-like format. -1. Memory object is processed -2. Object is converted to a dictionary -3. Data is organized into specific fields -4. Memory is saved to Firestore +### 3. πŸ“Š Detailed Data Fields - ![Backend Memory Storage](/images/memorystore.png) +The memory dictionary contains the following key fields: +| Field | Description | Example | +|-------|-------------|---------| +| `id` | Unique identifier for the memory | `"550e8400-e29b-41d4-a716-446655440000"` | +| `created_at` | Timestamp of memory creation | `datetime(2023, 4, 1, 12, 0, 0)` | +| `started_at` | Timestamp when the associated event started | `datetime(2023, 4, 1, 11, 55, 0)` | +| `finished_at` | Timestamp when the associated event ended | `datetime(2023, 4, 1, 12, 5, 0)` | +| `source` | Origin of the memory | `"conversation"`, `"openglass"`, `"workflow"` | +| `language` | Language code of the conversation | `"en-US"` | +| `structured` | Dictionary of extracted structured information | (see below) | +| `transcript_segments` | List of transcript segments | (see below) | +| `geolocation` | Location data (if available) | `{"latitude": 37.7749, "longitude": -122.4194}` | +| `plugins_results` | Results from any plugins run on the memory | `[{"plugin_id": "weather", "data": {...}}]` | +| `external_data` | Additional data from external integrations | `{"source": "calendar", "event_id": "123"}` | +| `postprocessing` | Information about post-processing status | `{"status": "completed", "model": "fal_whisperx"}` | +| `discarded` | Boolean indicating if the memory is low-quality | `false` | +| `deleted` | Boolean indicating if the memory has been deleted | `false` | +| `visibility` | Visibility setting of the memory | `"private"` | +#### πŸ“‹ Structured Information -## 🧠 Detailed Steps +The `structured` field contains key information extracted from the memory: + +```python +structured = { + "title": "Team Meeting Discussion on Q2 Goals", + "overview": "Discussed Q2 goals, focusing on product launch and market expansion.", + "emoji": "πŸš€", + "category": "work", + "action_items": [ + "Finalize product features by April 15", + "Schedule market research presentation for next week" + ], + "events": [ + { + "title": "Q2 Goals Follow-up", + "start_time": "2023-04-08T14:00:00", + "end_time": "2023-04-08T15:00:00" + } + ] +} +``` -### 1. πŸ“₯ Memory Object Received +#### πŸ—£οΈ Transcript Segments -- The `process_memory` function in `utils/memories/process_memory.py` processes a new or updated memory -- The complete Memory object is then sent to the `upsert_memory` function in `database/memories.py` +Each segment in `transcript_segments` includes detailed information about the speech: + +```python +transcript_segments = [ + { + "speaker": "SPEAKER_00", + "start": 0.0, + "end": 5.2, + "text": "Good morning, team. Let's discuss our Q2 goals.", + "is_user": True, + "person_id": None + }, + { + "speaker": "SPEAKER_01", + "start": 5.5, + "end": 10.8, + "text": "Sounds good. I think we should focus on the product launch.", + "is_user": False, + "person_id": "colleague123" + } + # ... more segments +] +``` -### 2. πŸ”„ Convert to Dictionary +#### πŸ”„ Postprocessing Information -- The `upsert_memory` function converts the Memory object into a Python dictionary -- This conversion is necessary because Firestore stores data in a JSON-like format +The `postprocessing` field contains information about any additional processing: -### 3. πŸ“Š Data Fields +```python +postprocessing = { + "status": "completed", # Options: "not_started", "in_progress", "completed", "failed" + "model": "fal_whisperx", + "fail_reason": None # Contains error message if status is "failed" +} +``` -The dictionary contains the following key fields: +### 4. πŸ’Ύ Storage in Firebase Firestore -| Field | Description | -|-------|-------------| -| `id` | Unique ID of the memory | -| `created_at` | Timestamp of memory creation | -| `started_at` | Timestamp when the associated event started | -| `finished_at` | Timestamp when the associated event ended | -| `source` | Source of the memory (e.g., "friend", "openglass", "workflow") | -| `language` | Language code of the conversation | -| `structured` | Dictionary of structured information (see below) | -| `transcript_segments` | List of transcript segments (see below) | -| `geolocation` | Location data (if available) | -| `plugins_results` | Results from any plugins run on the memory | -| `external_data` | Additional data from external integrations | -| `postprocessing` | Information about post-processing status | -| `discarded` | Boolean indicating if the memory is low-quality | -| `deleted` | Boolean indicating if the memory has been deleted | -| `visibility` | Visibility setting of the memory | +The `upsert_memory` function in `database/memories.py` handles the storage of the memory in Firestore: -#### πŸ“‹ Structured Information +```python +def upsert_memory(uid: str, memory_data: dict): + user_ref = db.collection('users').document(uid) + memory_ref = user_ref.collection('memories').document(memory_data['id']) + memory_ref.set(memory_data, merge=True) +``` -The `structured` field contains: +#### πŸ“ Firestore Structure -- `title`: Topic of the memory -- `overview`: Summary of the memory -- `emoji`: Representing emoji -- `category`: Category (e.g., "personal", "business") -- `action_items`: List of derived action items -- `events`: List of extracted calendar events +The memories are stored in a nested structure within Firestore: -#### πŸ—£οΈ Transcript Segments +``` +Users Collection +└── User Document (uid) + └── memories Collection + β”œβ”€β”€ Memory Document 1 (memory_id) + β”œβ”€β”€ Memory Document 2 (memory_id) + └── ... +``` + +This structure allows for efficient querying and management of user-specific memory data. -Each segment in `transcript_segments` includes: +### 5. 🧠 Vector Embedding Storage + +Along with storing the memory in Firestore, we generate and store a vector embedding of the memory in Pinecone: + +```python +def upsert_vector(uid: str, memory: Memory, vector: List[float]): + index.upsert(vectors=[{ + "id": f'{uid}-{memory.id}', + "values": vector, + 'metadata': { + 'uid': uid, + 'memory_id': memory.id, + 'created_at': memory.created_at.timestamp(), + } + }], namespace="ns1") +``` -- `speaker`: Speaker label (e.g., "SPEAKER_00") -- `start`: Start time in seconds -- `end`: End time in seconds -- `text`: Transcribed text -- `is_user`: Boolean indicating if spoken by the user -- `person_id`: ID of a person from user's profiles (if applicable) +This vector embedding allows for semantic search and similarity matching of memories. -#### πŸ”„ Postprocessing Information +### 6. πŸ”„ Optional Post-Processing and Updates -The `postprocessing` field contains: +After initial storage, memories can undergo additional processing: -- `status`: Current status (e.g., "not_started", "in_progress") -- `model`: Post-processing model used (e.g., "fal_whisperx") -- `fail_reason`: (Optional) Reason for failure +1. **Transcript Enhancement:** Using more accurate models like WhisperX for improved transcription. +2. **Emotional Analysis:** Processing with Hume AI to extract emotional context. +3. **Additional Plugin Processing:** Running newly enabled plugins on existing memories. -### 4. πŸ’Ύ Save to Firestore +After post-processing, the memory is updated in both Firestore and Pinecone to reflect the new information. -- `database/memories.py` uses the Firebase Firestore API to store the memory data dictionary -- Data is saved in the `memories` collection within the user's document +## πŸ” Querying and Retrieving Memories -#### πŸ“ Firestore Structure -Users Collection -```└── User Document -└── memories Collection -β”œβ”€β”€ Memory Document 1 -β”œβ”€β”€ Memory Document 2 -└── ... +Memories can be retrieved using various methods: + +1. **Direct Lookup:** Using the memory ID to fetch from Firestore. +2. **Semantic Search:** Using vector embeddings in Pinecone to find similar memories. +3. **Filtered Queries:** Using Firestore queries to filter memories by date, category, etc. + +Example of a semantic search: + +```python +def query_vectors(query: str, uid: str, k: int = 5) -> List[str]: + xq = embeddings.embed_query(query) + results = index.query(vector=xq, top_k=k, namespace="ns1", filter={"uid": uid}) + return [item['id'].split('-')[1] for item in results['matches']] ``` -This structure allows for efficient querying and management of user-specific memory data. + +## πŸ”’ Security and Privacy Considerations + +- All memory data is associated with a specific user ID for data isolation. +- Firestore security rules ensure that users can only access their own memories. +- Sensitive information in memories (e.g., personal identifiers) should be handled with care. + +## πŸš€ Performance Optimization + +- Use of Firestore indexes for frequently accessed query patterns. +- Caching of frequently accessed memories in Redis for faster retrieval. +- Batch operations for inserting or updating multiple memories at once. + +## πŸ”„ Continuous Improvement + +The memory storage process is continually refined to improve: + +- Accuracy of structured data extraction +- Efficiency of vector embedding generation +- Integration with new data sources and plugins + +By following this comprehensive memory storage process, Omi ensures that user interactions are accurately captured, enriched, and made available for intelligent retrieval and analysis. diff --git a/docs/_get_started/backend/backend_deepdive.md b/docs/_get_started/backend/backend_deepdive.md index 2f8dd7d2f..33fa486be 100644 --- a/docs/_get_started/backend/backend_deepdive.md +++ b/docs/_get_started/backend/backend_deepdive.md @@ -5,326 +5,379 @@ parent: Backend nav_order: 2 --- -# Omi Backend Deep Dive πŸ§ πŸŽ™οΈ +# Omi Backend Deep Dive ## Table of Contents -1. [Understanding the Omi Ecosystem](#understanding-the-omi-ecosystem-) +1. [Understanding the Omi Ecosystem](#understanding-the-omi-ecosystem) 2. [System Architecture](#system-architecture) -3. [The Flow of Information: From User Interaction to Memory](#the-flow-of-information-from-user-interaction-to-memory-) -4. [The Core Components: A Closer Look](#the-core-components-a-closer-look-) - - [database/memories.py: The Memory Guardian](#1-databasememoriespy-the-memory-guardian-) - - [database/vector_db.py: The Embedding Expert](#2-databasevector_dbpy-the-embedding-expert-) - - [utils/llm.py: The AI Maestro](#3-utilsllmpy-the-ai-maestro-) - - [utils/other/storage.py: The Cloud Storage Manager](#4-utilsotherstoragepy-the-cloud-storage-manager-) - - [database/redis_db.py: The Data Speedster](#5-databaseredis_dbpy-the-data-speedster-) - - [routers/transcribe.py: The Real-Time Transcription Engine](#6-routerstranscribepy-the-real-time-transcription-engine-) -5. [Other Important Components](#other-important-components-) -6. [Contributing](#contributing-) -7. [Support](#support-) - -Welcome to the Omi backend! This document provides a comprehensive overview of Omi's architecture and code, guiding you through its key components, functionalities, and how it all works together to -power a unique and intelligent AI assistant experience. - -## Understanding the Omi Ecosystem πŸ—ΊοΈ - -Omi is a multimodal AI assistant designed to understand and interact with users in a way that's both intelligent and human-centered. The backend plays a crucial role in this by: +3. [The Flow of Information: From User Interaction to Memory](#the-flow-of-information-from-user-interaction-to-memory) +4. [The Core Components: A Closer Look](#the-core-components-a-closer-look) + - [database/memories.py: Memory Management](#1-databasememoriespy-memory-management) + - [database/vector_db.py: Vector Database Management](#2-databasevector_dbpy-vector-database-management) + - [utils/llm.py: Language Model Utilities](#3-utilsllmpy-language-model-utilities) + - [utils/other/storage.py: Cloud Storage Manager](#4-utilsotherstoragepy-cloud-storage-manager) + - [database/redis_db.py: Redis Database Operations](#5-databaseredis_dbpy-redis-database-operations) + - [routers/transcribe.py: Real-Time Transcription Engine](#6-routerstranscribepy-real-time-transcription-engine) + - [database/processing_memories.py: Memory Processing Pipeline](#7-databaseprocessing_memoriespy-memory-processing-pipeline) +5. [Modal Serverless Deployment](#modal-serverless-deployment) +6. [Error Handling and Logging](#error-handling-and-logging) +7. [Performance Optimization](#performance-optimization) +8. [Security Considerations](#security-considerations) +9. [External Integrations and Workflows](#external-integrations-and-workflows) +10. [Contributing](#contributing) +11. [Support](#support) + +Welcome to the Omi Backend Deep Dive. This document provides a comprehensive overview of Omi's architecture and code, guiding you through its key components, functionalities, and how it all works together to power a unique and intelligent AI assistant experience. + +## Understanding the Omi Ecosystem + +Omi is a multimodal AI assistant designed to understand and interact with users in a way that is both intelligent and human-centered. The backend plays a crucial role in this by: - **Processing and analyzing data:** Converting audio to text, extracting meaning, and creating structured information from user interactions. - **Storing and managing memories:** Building a rich knowledge base of user experiences that Omi can draw upon to provide context and insights. - **Facilitating intelligent conversations:** Understanding user requests, retrieving relevant information, and generating personalized responses. - **Integrating with external services:** Extending Omi's capabilities and connecting it to other tools and platforms. -This deep dive will walk you through the **core elements** of Omi's backend, providing a clear roadmap for developers and enthusiasts alike to understand its inner workings. +This document will walk you through the core elements of Omi's backend, providing a clear roadmap for developers and enthusiasts to understand its inner workings. ## System Architecture ![Backend Detailed Overview](/images/backend.png) -You can click on the image to view it in full size and zoom in for more detail. +*You can click on the image to view it in full size and zoom in for more detail.* + +### Component Interactions + +Here's a detailed look at how key components interact: + +1. **Real-time Transcription Flow:** + + ```mermaid + sequenceDiagram + participant User + participant OmiApp + participant WebSocket + participant TranscriptionServices + participant MemoryProcessing + + User->>OmiApp: Start recording + OmiApp->>WebSocket: Establish connection + loop Audio Streaming + OmiApp->>WebSocket: Stream audio chunks + WebSocket->>TranscriptionServices: Forward audio + TranscriptionServices->>WebSocket: Return transcripts + WebSocket->>OmiApp: Send live transcripts + end + User->>OmiApp: Stop recording + OmiApp->>MemoryProcessing: Process transcribed memory + ``` + +2. **Memory Creation and Embedding Flow:** -## The Flow of Information: From User Interaction to Memory 🌊 + ```mermaid + sequenceDiagram + participant MemoryProcessing + participant OpenAI + participant Firestore + participant Pinecone + + MemoryProcessing->>OpenAI: Extract structured data + OpenAI->>MemoryProcessing: Return structured info + MemoryProcessing->>OpenAI: Generate embedding + OpenAI->>MemoryProcessing: Return embedding vector + MemoryProcessing->>Firestore: Store memory data + MemoryProcessing->>Pinecone: Store embedding vector + ``` + +### Modal Serverless Deployment + +Omi's backend leverages Modal for serverless deployment, allowing for efficient scaling and management of computational resources. Key components of the Modal setup include: + +- **App Configuration:** The `modal_app` is configured in `main.py` with specific secrets and environment variables. +- **Image Definition:** A custom Docker image is defined with necessary dependencies and configurations. +- **API Function:** The main FastAPI app is wrapped in a Modal function, allowing for easy deployment and scaling. +- **Cron Job:** A notifications cron job is set up to run every minute using Modal's scheduling capabilities. + +```python +modal_app = App( + name='backend', + secrets=[Secret.from_name("gcp-credentials"), Secret.from_name('envs')], +) +image = ( + Image.debian_slim() + .apt_install('ffmpeg', 'git', 'unzip') + .pip_install_from_requirements('requirements.txt') +) + +@modal_app.function( + image=image, + keep_warm=2, + memory=(512, 1024), + cpu=2, + allow_concurrent_inputs=10, + timeout=60 * 10, +) +@asgi_app() +def api(): + return app + +@modal_app.function(image=image, schedule=Cron('* * * * *')) +async def notifications_cronjob(): + await start_cron_job() +``` + +## The Flow of Information: From User Interaction to Memory Let's trace the journey of a typical interaction with Omi, focusing on how audio recordings are transformed into lasting memories: -### A. User Initiates a Recording 🎀 +### A. User Initiates a Recording 1. **Recording Audio:** The user starts a recording session using the Omi app, capturing a conversation or their thoughts. -### B. Real-Time Transcription with Deepgram 🎧 +### B. Real-Time Transcription with Multiple Services 2. **WebSocket Connection:** The Omi app establishes a real-time connection with the backend using WebSockets (at the `/listen` endpoint in `routers/transcribe.py`). 3. **Streaming Audio:** The app streams audio data continuously through the WebSocket to the backend. -4. **Deepgram Processing:** The backend forwards the audio data to the Deepgram API for real-time speech-to-text conversion. -5. **Transcription Results:** As Deepgram transcribes the audio, it sends results back to the backend. +4. **Multiple Transcription Services:** The backend forwards the audio data to multiple transcription services, including Deepgram, Soniox, and Speechmatics, for real-time speech-to-text conversion. +5. **Transcription Results:** As the services transcribe the audio, they send results back to the backend. 6. **Live Feedback:** The backend relays these transcription results back to the Omi app, allowing for live transcription display as the user is speaking. -### C. Creating a Lasting Memory πŸ’Ύ - -7. **API Request to `/v1/memories`:** When the conversation session ends, the Omi app sends a POST request to the `/v1/memories` endpoint in `routers/memories.py`. -8. **Data Formatting:** The request includes information about the start and end time of the recording, the language, optional geolocation data, and the transcribed text segments from Deepgram. -9. **Memory Creation (`routers/memories.py`):** The `create_memory` function in this file receives the request and performs basic validation on the data. -10. **Processing the Memory (`utils/memories/process_memory.py`):** - - The `create_memory` function delegates the core memory processing logic to the `process_memory` function. This function is where the real magic happens! - - **Structure Extraction:** OpenAI's powerful large language model (LLM) is used to analyze the transcript and extract key information, creating a structured representation of the memory. This - includes: - - `title`: A short, descriptive title. - - `overview`: A concise summary of the main points. - - `category`: A relevant category to organize memories (work, personal, etc.). - - `action_items`: Any tasks or to-dos mentioned. - - `events`: Events that might need to be added to a calendar. - - **Embedding Generation:** The LLM is also used to create a vector embedding of the memory, capturing its semantic meaning for later retrieval. +### C. Creating a Processing Memory + +7. **API Request to `/v1/processing-memories`:** When the conversation session ends, the Omi app sends a POST request to the `/v1/processing-memories` endpoint in `routers/processing_memories.py`. +8. **Data Formatting:** The request includes information about the start and end time of the recording, the language, optional geolocation data, and the transcribed text segments from the transcription services. +9. **Processing Memory Creation:** The `create_processing_memory` function receives the request and creates a new processing memory document. + +### D. Processing the Memory + +10. **Memory Processing (`utils/memories/process_memory.py`):** + - The processing memory is analyzed and enriched with additional information. + - **Structure Extraction:** OpenAI's large language model (LLM) is used to analyze the transcript and extract key information, creating a structured representation of the memory. + - **Embedding Generation:** The LLM generates a vector embedding of the memory, capturing its semantic meaning for later retrieval. - **Plugin Execution:** If the user has enabled any plugins, relevant plugins are run to enrich the memory with additional insights, external actions, or other context-specific information. - - **Storage in Firestore:** The fully processed memory, including the transcript, structured data, plugin results, and other metadata, is stored in Firebase Firestore (a NoSQL database) for - persistence. - - **Embedding Storage in Pinecone:** The memory embedding is sent to Pinecone, a vector database, to enable fast and efficient similarity searches later. + - **Emotional Analysis:** If enabled, the audio is analyzed for emotional content using Hume AI. -### D. Enhancing the Memory (Optional) +### E. Finalizing the Memory -11. **Post-Processing:** The user can optionally trigger post-processing of the memory to improve the quality of the transcript. This involves: - - Sending the audio to a more accurate transcription service (like WhisperX through a FAL.ai function). - - Updating the memory in Firestore with the new transcript. - - Re-generating the embedding to reflect the updated content. +11. **Storage in Firestore:** The fully processed memory, including the transcript, structured data, plugin results, emotional analysis, and other metadata, is stored in Firebase Firestore for persistence. +12. **Embedding Storage in Pinecone:** The memory embedding is sent to Pinecone, a vector database, to enable fast and efficient similarity searches later. -## The Core Components: A Closer Look πŸ”Ž +### F. Post-Processing (Optional) -Now that you understand the general flow, let's dive deeper into the key modules and services that power Omi's backend. +13. **Enhanced Transcription:** The user can optionally trigger post-processing of the memory to improve the quality of the transcript using more accurate models like WhisperX through FAL.ai. +14. **Updating the Memory:** The memory in Firestore is updated with the new transcript, and the embedding is regenerated to reflect the updated content. -### 1. `database/memories.py`: The Memory Guardian πŸ›‘οΈ +## The Core Components: A Closer Look -This module is responsible for managing the interaction with Firebase Firestore, Omi's main database for storing memories and related data. +Now that you understand the general flow, let's dive deeper into the key modules and services that power Omi's backend. -**Key Functions:** +### 1. `database/memories.py`: Memory Management -- `upsert_memory`: Creates or updates a memory document in Firestore, ensuring efficient storage and handling of updates. -- `get_memory`: Retrieves a specific memory by its ID. -- `get_memories`: Fetches a list of memories for a user, allowing for filtering, pagination, and optional inclusion of discarded memories. -- **OpenGlass Functions:** Handles the storage and retrieval of photos associated with memories created through OpenGlass. -- **Post-Processing Functions:** Manages the storage of data related to transcript post-processing (status, model used, alternative transcription segments). +This module handles memory storage and management features: -**Firestore Structure:** +- **Post-Processing:** Functions handle the storage and retrieval of post-processing data, including status updates and alternative transcription segments. +- **OpenGlass Integration:** Functions for storing and retrieving photos associated with memories created through OpenGlass. +- **Visibility Management:** Functions manage the visibility of memories, allowing for public and private memories. -Each memory is stored as a document in Firestore with the following fields: +**Key Functions:** ```python -class Memory(BaseModel): - id: str # Unique ID - created_at: datetime # Creation timestamp - started_at: Optional[datetime] - finished_at: Optional[datetime] - - source: Optional[MemorySource] - language: Optional[str] - - structured: Structured # Contains extracted title, overview, action items, etc. - transcript_segments: List[TranscriptSegment] - geolocation: Optional[Geolocation] - photos: List[MemoryPhoto] - - plugins_results: List[PluginResult] - external_data: Optional[Dict] - postprocessing: Optional[MemoryPostProcessing] - - discarded: bool - deleted: bool -``` +def upsert_memory(uid: str, memory_data: dict): + # Creates or updates a memory document in Firestore -### 2. `database/vector_db.py`: The Embedding Expert 🌲 +def get_memory_photos(uid: str, memory_id: str): + # Retrieves photos associated with a memory (OpenGlass integration) -This module manages the interaction with Pinecone, a vector database used to store and query memory embeddings. +def set_memory_visibility(uid: str, memory_id: str, visibility: str): + # Sets the visibility status of a memory (public/private) -**Key Functions:** +def set_postprocessing_status(uid: str, memory_id: str, status: PostProcessingStatus, fail_reason: str = None, model: PostProcessingModel = PostProcessingModel.fal_whisperx): + # Updates the post-processing status of a memory -- `upsert_vector`: Adds or updates a memory embedding in Pinecone. -- `upsert_vectors`: Efficiently adds or updates multiple embeddings. -- `query_vectors`: Performs similarity search to find memories relevant to a user query. -- `delete_vector`: Removes a memory embedding. +def store_model_emotion_predictions_result(uid: str, memory_id: str, model_name: str, predictions: List[hume.HumeJobModelPredictionResponseModel]): + # Stores emotional analysis results for a memory +``` -**Pinecone's Role:** +### 2. `database/vector_db.py`: Vector Database Management -Pinecone's specialized vector search capabilities are essential for: +This module manages the vector embeddings using the Pinecone client: -- **Contextual Retrieval:** Finding memories that are semantically related to a user's request, even if they don't share exact keywords. -- **Efficient Search:** Quickly retrieving relevant memories from a large collection. -- **Scalability:** Handling the growing number of memory embeddings as the user creates more memories. +- **Namespace Usage:** Vectors are stored in a namespace, allowing for better organization of embeddings. +- **Metadata Filtering:** The `query_vectors` function supports advanced filtering based on metadata, including date ranges. -### 3. `utils/llm.py`: The AI Maestro 🧠 +```python +def query_vectors(query: str, uid: str, starts_at: int = None, ends_at: int = None, k: int = 5) -> List[str]: + filter_data = {'uid': uid} + if starts_at is not None: + filter_data['created_at'] = {'$gte': starts_at, '$lte': ends_at} + + xq = embeddings.embed_query(query) + xc = index.query(vector=xq, top_k=k, include_metadata=False, filter=filter_data, namespace="ns1") + return [item['id'].replace(f'{uid}-', '') for item in xc['matches']] +``` -This module is where the power of OpenAI's LLMs is harnessed for a wide range of tasks. It's the core of Omi's intelligence! +### 3. `utils/llm.py`: Language Model Utilities + +This module harnesses the power of OpenAI's LLMs for a wide range of tasks: **Key Functionalities:** -- **Memory Processing:** - - Determines if a conversation should be discarded. - - Extracts structured information from transcripts (title, overview, categories, etc.). - - Runs plugins on memory data. - - Handles post-processing of transcripts to improve accuracy. -- **OpenGlass and External Integration Processing:** - - Creates structured summaries from photos and descriptions (OpenGlass). - - Processes data from external sources (like ScreenPipe) to generate memories. -- **Chat and Retrieval:** - - Generates initial chat messages. - - Analyzes chat conversations to determine if context is needed. - - Extracts relevant topics and dates from chat history. - - Retrieves and summarizes relevant memory content for chat responses. -- **Emotional Processing:** - - Analyzes conversation transcripts for user emotions. - - Generates emotionally aware responses based on context and user facts. +- **Memory Processing:** Analyzes transcripts, extracts structured information, runs plugins on memory data, and handles post-processing of transcripts. +- **External Integration Processing:** Creates structured summaries from photos and descriptions, processes data from external sources. +- **Chat and Retrieval:** Generates initial chat messages, analyzes chat conversations, extracts relevant topics and dates, retrieves and summarizes relevant memory content. +- **Emotional Processing:** Analyzes conversation transcripts for user emotions, generates emotionally aware responses. - **Fact Extraction:** Identifies and extracts new facts about the user from conversation transcripts. -**OpenAI Integration:** +### 4. `utils/other/storage.py`: Cloud Storage Manager -- `llm.py` leverages OpenAI's `ChatOpenAI` model (specifically `gpt-4o` in the code, but you can use other models) for language understanding, generation, and reasoning. -- It uses OpenAI's `OpenAIEmbeddings` model to generate vector embeddings for memories and user queries. +This module handles interactions with Google Cloud Storage (GCS), specifically for managing user speech profiles. -**Why `llm.py` is Essential:** +**Key Functions:** -- **The Brain of Omi:** This module enables Omi's core AI capabilities, including natural language understanding, content generation, and context-aware interactions. -- **Memory Enhancement:** It enriches raw data by extracting meaning and creating structured information. -- **Personalized Responses:** It helps Omi provide responses that are tailored to individual users, incorporating their unique facts, memories, and even emotional states. -- **Extensibility:** The plugin system and integration with external services make Omi highly versatile. +```python +def upload_profile_audio(file_path: str, uid: str): + # Uploads a user's speech profile audio to GCS -### 4. `utils/other/storage.py`: The Cloud Storage Manager ☁️ +def get_profile_audio_if_exists(uid: str) -> str: + # Retrieves a user's speech profile from GCS if it exists +``` -This module handles interactions with Google Cloud Storage (GCS), specifically for managing user speech profiles. +### 5. `database/redis_db.py`: Redis Database Operations + +Redis is used for caching, managing user settings, and storing user speech profiles. **Key Functions:** -- **`upload_profile_audio(file_path: str, uid: str)`:** - - Uploads a user's speech profile audio recording to the GCS bucket specified by the `BUCKET_SPEECH_PROFILES` environment variable. - - Organizes audio files within the bucket using the user's ID (`uid`). - - Returns the public URL of the uploaded file. -- **`get_profile_audio_if_exists(uid: str) -> str`:** - - Checks if a speech profile already exists for a given user ID in the GCS bucket. - - Downloads the speech profile audio to a local temporary file if it exists and returns the file path. - - Returns `None` if the profile does not exist. +```python +def store_user_speech_profile(uid: str, data: List[List[int]]): + # Stores a user's speech profile in Redis -**Usage:** +def get_enabled_plugins(uid: str): + # Retrieves the list of enabled plugins for a user -- The `upload_profile_audio` function is called when a user uploads a new speech profile recording through the `/v3/upload-audio` endpoint (defined in `routers/speech_profile.py`). -- The `get_profile_audio_if_exists` function is used to retrieve a user's speech profile when needed, for example, during speaker identification in real-time transcription or post-processing. +def cache_signed_url(blob_path: str, signed_url: str, ttl: int = 60 * 60): + # Caches a signed URL for cloud storage objects -### 5. `database/redis_db.py`: The Data Speedster πŸš€ +def add_public_memory(memory_id: str): + # Marks a memory as public in Redis +``` -Redis is an in-memory data store known for its speed and efficiency. The `database/redis_db.py` module handles Omi's interactions with Redis, which is primarily used for caching, managing user -settings, and storing user speech profiles. +### 6. `routers/transcribe.py`: Real-Time Transcription Engine -**Data Stored and Retrieved from Redis:** +This module manages real-time audio transcription using multiple services. -- **User Speech Profiles:** - - **Storage:** When a user uploads a speech profile, the raw audio data, along with its duration, is stored in Redis. - - **Retrieval:** During real-time transcription or post-processing, the user's speech profile is retrieved from Redis to aid in speaker identification. -- **Enabled Plugins:** - - **Storage:** A set of plugin IDs is stored for each user, representing the plugins they have enabled. - - **Retrieval:** When processing a memory or handling a chat request, the backend checks Redis to see which plugins are enabled for the user. -- **Plugin Reviews:** - - **Storage:** Reviews for each plugin (score, review text, date) are stored in Redis, organized by plugin ID and user ID. - - **Retrieval:** When displaying plugin information, the backend retrieves reviews from Redis. -- **Cached User Names:** - - **Storage:** User names are cached in Redis to avoid repeated lookups from Firebase. - - **Retrieval:** The backend first checks Redis for a user's name before querying Firestore, improving performance. +**Key Features:** -**Key Functions:** +- **Multiple Transcription Services:** Integrates Deepgram, Soniox, and Speechmatics for real-time speech-to-text conversion. +- **WebSocket Communication:** Utilizes WebSockets for real-time data streaming. +- **Speaker Diarization:** Integrates user speech profile for speaker identification. + +```python +@router.websocket("/listen") +async def websocket_endpoint(websocket: WebSocket, uid: str, language: str = 'en', ...): + await websocket.accept() + + # Start multiple transcription services + transcript_socket_deepgram = await process_audio_dg(uid, websocket, language, ...) + transcript_socket_soniox = await process_audio_soniox(uid, websocket, language, ...) + transcript_socket_speechmatics = await process_audio_speechmatics(uid, websocket, language, ...) -- `store_user_speech_profile`, `get_user_speech_profile`: For storing and retrieving speech profiles. -- `store_user_speech_profile_duration`, `get_user_speech_profile_duration`: For managing speech profile durations. -- `enable_plugin`, `disable_plugin`, `get_enabled_plugins`: For handling plugin enable/disable states. -- `get_plugin_reviews`: Retrieves reviews for a plugin. -- `cache_user_name`, `get_cached_user_name`: For caching user names. - **Why Redis is Important:** - -- **Performance:** Caching data in Redis significantly improves the backend's speed, as frequently accessed data can be retrieved from memory very quickly. -- **User Data Management:** Redis provides a flexible and efficient way to manage user-specific data, such as plugin preferences and speech profiles. -- **Real-time Features:** The low-latency nature of Redis makes it ideal for supporting real-time features like live transcription and instant plugin interactions. -- **Scalability:** As the number of users grows, Redis helps maintain performance by reducing the load on primary databases. - -### 6. `routers/transcribe.py`: The Real-Time Transcription Engine πŸŽ™οΈ - -This module is the powerhouse behind Omi's real-time transcription capabilities, allowing the app to convert spoken audio into text as the user is speaking. It leverages WebSockets for bidirectional -communication with the Omi app and Deepgram's speech-to-text API for accurate and efficient transcription. - -#### 1. WebSocket Communication: The Lifeline of Real-Time Interactions πŸ”Œ - -- **`/listen` Endpoint:** The Omi app initiates a WebSocket connection with the backend at the `/listen` endpoint, which is defined in the `websocket_endpoint` function of `routers/transcribe.py`. -- **Bidirectional Communication:** WebSockets enable a two-way communication channel, allowing: - - The Omi app to stream audio data to the backend continuously. - - The backend to send back transcribed text segments as they become available from Deepgram. -- **Real-Time Feedback:** This constant back-and-forth ensures that users see their words being transcribed in real-time, creating a more interactive and engaging experience. - -#### 2. Deepgram Integration: Converting Speech to Text with Precision πŸŽ§βž‘οΈπŸ“ - -- **`process_audio_dg` Function:** The `process_audio_dg` function (found in `utils/stt/streaming.py`) manages the interaction with Deepgram. -- **Deepgram API:** The audio chunks streamed from the Omi app are sent to the Deepgram API for transcription. Deepgram's sophisticated speech recognition models process the audio and return text - results. -- **Options Configuration:** The `process_audio_dg` function configures various Deepgram options, including: - - `punctuate`: Automatically adds punctuation to the transcribed text. - - `no_delay`: Minimizes latency for real-time feedback. - - `language`: Sets the language for transcription. - - `interim_results`: (Set to `False` in the code) Controls whether to send interim (partial) transcription results or only final results. - - `diarize`: Enables speaker diarization (identifying different speakers in the audio). - - `encoding`, `sample_rate`: Sets audio encoding and sample rate for compatibility with Deepgram. - -#### 3. Transcription Flow: A Step-by-Step Breakdown 🌊 - -1. **App Streams Audio:** The Omi app captures audio from the user's device and continuously sends chunks of audio data through the WebSocket to the backend's `/listen` endpoint. -2. **Backend Receives and Forwards:** The backend's `websocket_endpoint` function receives the audio chunks and immediately forwards them to Deepgram using the `process_audio_dg` function. -3. **Deepgram Processes:** Deepgram's speech recognition models transcribe the audio data in real-time. -4. **Results Sent Back:** Deepgram sends the transcribed text segments back to the backend as they become available. -5. **Backend Relays to App:** The backend immediately sends these transcription results back to the Omi app over the WebSocket connection. -6. **App Displays Transcript:** The Omi app updates the user interface with the newly transcribed text, providing instant feedback. - -#### 4. Key Considerations - -- **Speaker Identification:** The code uses Deepgram's speaker diarization feature to identify different speakers in the audio. This information is included in the transcription results, allowing the - app to display who said what. -- **User Speech Profile Integration:** If a user has uploaded a speech profile, the backend can use this information (retrieved from Redis or Google Cloud Storage) to improve the accuracy of speaker - identification. -- **Latency Management:** Real-time transcription requires careful attention to latency to ensure a seamless user experience. The `no_delay` option in Deepgram and the efficient handling of data in - the backend are essential for minimizing delays. -- **Error Handling:** The code includes error handling mechanisms to gracefully handle any issues that may occur during the WebSocket connection or Deepgram transcription process. - -#### 5. Example Code Snippet (Simplified): + # ... (rest of the function) +``` + +### 7. `database/processing_memories.py`: Memory Processing Pipeline + +This module manages memories that are still in the processing stage. + +**Key Functions:** ```python -from fastapi import APIRouter, WebSocket +def upsert_processing_memory(uid: str, processing_memory_data: dict): + # Creates or updates a processing memory document -# ... other imports ... +def update_processing_memory_segments(uid: str, id: str, segments: List[dict]): + # Updates the transcript segments of a processing memory -router = APIRouter() +def get_last(uid: str): + # Retrieves the most recent processing memory for a user +``` +## Error Handling and Logging -@router.websocket("/listen") -async def websocket_endpoint(websocket: WebSocket, uid: str, language: str = 'en', ...): - await websocket.accept() # Accept the WebSocket connection +Omi implements a robust error handling and logging system: - # Start Deepgram transcription - transcript_socket = await process_audio_dg(uid, websocket, language, ...) +### Global Exception Handler - # Receive and process audio chunks from the app - async for data in websocket.iter_bytes(): - transcript_socket.send(data) +```python +@app.exception_handler(Exception) +async def global_exception_handler(request: Request, exc: Exception): + log_error(exc) + return JSONResponse( + status_code=500, + content={"message": "An unexpected error occurred. Our team has been notified."}, + ) +``` + +### Structured Logging + +```python +import structlog + +logger = structlog.get_logger() - # ... other logic for speaker identification, error handling, etc. +def log_error(exc: Exception, **kwargs): + logger.error("An error occurred", error=str(exc), traceback=traceback.format_exc(), **kwargs) ``` -## Other Important Components 🧩 +### Error Monitoring + +- **Integration with Monitoring Services:** Integration with error monitoring services (e.g., Sentry) for real-time alerts and error tracking. + +## Performance Optimization + +Omi employs several strategies to optimize performance: + +### Caching + +- **Redis Caching:** Redis is used for caching frequently accessed data. +- **Embeddings Caching:** Intelligent caching of embeddings and transcription results to reduce redundant computations. + +### Database Optimization + +- **Indexed Firestore Queries:** Firestore indexes are carefully designed for common query patterns. +- **Batch Operations:** Batch operations are used for bulk updates to minimize network overhead. + +### Asynchronous Processing + +- **Background Workers:** Long-running tasks are offloaded to background workers. +- **Async I/O Operations:** Utilizes Python's `asyncio` for non-blocking I/O operations. + +### Load Testing and Profiling + +- **Regular Load Testing:** Identifies bottlenecks and ensures scalability. +- **Profiling Tools:** Critical paths are profiled to optimize resource usage. + +## Security Considerations -- **`routers/transcribe.py`:** Manages real-time audio transcription using Deepgram, sending the transcribed text back to the Omi app for display. -- **`routers/workflow.py`, `routers/screenpipe.py`:** Define API endpoints for external integrations to trigger memory creation. +Omi takes security seriously to protect user data and system integrity. Key security measures include: -We hope this deep dive into the Omi backend has provided valuable insights into its architecture, codebase, and the powerful technologies that drive its intelligent and human-centered interactions. +- **Data Encryption:** All data is encrypted at rest and in transit using industry-standard encryption algorithms. +- **Access Control:** Fine-grained access control mechanisms are implemented to ensure only authorized users can access their data. +- **Authentication and Authorization:** Robust authentication and authorization mechanisms are in place to prevent unauthorized access. +- **Input Validation:** All user input is validated and sanitized to prevent injection attacks and other security vulnerabilities. +- **Secure Deployment:** Omi is deployed in a secure environment with strict access controls and network segmentation. -## Contributing 🀝 +## Contributing -We welcome contributions from the open source community! Whether it's improving documentation, adding new features, or reporting bugs, your input is valuable. Check out -our [Contribution Guide](https://docs.omi.me/developer/Contribution/) for more information. +We welcome contributions from the open source community. Whether it's improving documentation, adding new features, or reporting bugs, your input is valuable. Check out our [Contribution Guide](https://docs.omi.me/developer/Contribution/) for more information. -## Support πŸ†˜ +## Support If you're stuck, have questions, or just want to chat about Omi: -- **GitHub Issues:** πŸ› For bug reports and feature requests -- **Community Forum:** πŸ’¬ Join our [community forum](https://discord.gg/ZutWMTJnwA) for discussions and questions -- **Documentation:** πŸ“š Check out our [full documentation](https://docs.omi.me/) for in-depth guides +- **GitHub Issues:** For bug reports and feature requests. +- **Community Forum:** Join our [community forum](https://discord.gg/ZutWMTJnwA) for discussions and questions. +- **Documentation:** Check out our [full documentation](https://docs.omi.me/) for in-depth guides. -Happy coding! πŸ’» If you have any questions or need further assistance, don't hesitate to reach out to our community. +If you have any questions or need further assistance, please don't hesitate to reach out to our community. diff --git a/docs/_get_started/backend/memory_embeddings.md b/docs/_get_started/backend/memory_embeddings.md index 673f7c64e..4e6e745cc 100644 --- a/docs/_get_started/backend/memory_embeddings.md +++ b/docs/_get_started/backend/memory_embeddings.md @@ -4,96 +4,380 @@ title: Memory Embeddings parent: Backend nav_order: 5 --- -# 🧠 Memory Embedding Process in Omi -This document outlines how Omi creates and stores embeddings for memories. +# 🧠 Guide to Memory Embedding Process in Omi + +This document provides an in-depth look at how Omi creates, stores, and utilizes embeddings for memories, a crucial component of its intelligent retrieval and analysis capabilities. ## πŸ”„ Process Overview 1. Memory processing triggers embedding creation -2. Structured data is extracted from the memory -3. OpenAI API generates the embedding +2. Structured data is extracted and prepared from the memory +3. OpenAI API generates the embedding vector 4. Metadata is created for the embedding -5. Embedding and metadata are stored in Pinecone +5. Embedding vector and metadata are stored in Pinecone +6. Embeddings are used for semantic search and memory retrieval ![Embeddings](/images/embeddings.png) - -## πŸ“Š Detailed Steps +## πŸ“Š Detailed Steps in Memory Embedding ### 1. Memory Processing Triggers Embedding Creation -- Initiated in `utils/memories/process_memory.py` when: - - A new memory is created - - An existing memory is reprocessed -- `process_memory` function calls `upsert_vector` in `database/vector_db.py` +The embedding process is initiated in `utils/memories/process_memory.py` under two main scenarios: + +a) When a new memory is created +b) When an existing memory is reprocessed (e.g., after post-processing) + +```python +# In utils/memories/process_memory.py +async def process_memory(uid: str, processing_memory_id: str): + # Retrieve the processing memory + processing_memory = get_processing_memory_by_id(uid, processing_memory_id) + + # Extract structured data + structured_data = await extract_structured_data(processing_memory.transcript) + + # Generate embedding + embedding = generate_memory_embedding(processing_memory, structured_data) + + # Create the memory object + memory_data = create_memory_object(processing_memory, structured_data) + + # Store the memory and its embedding + upsert_memory(uid, memory_data) + upsert_vector(uid, memory_data, embedding) +``` + +### 2. Extract and Prepare Structured Data -### 2. Extract Structured Data +The `extract_structured_data` function in `utils/llm.py` uses OpenAI's language model to extract key information from the memory transcript: -- `database/vector_db.py` passes the Memory object to `utils/llm.py` -- `utils/llm.py` extracts the `structured` field from the Memory object +```python +# In utils/llm.py +async def extract_structured_data(transcript: str) -> dict: + prompt = f"Extract key information from this transcript:\n\n{transcript}\n\nProvide a structured output with title, overview, emoji, category, action items, and events." + response = await openai.ChatCompletion.create( + model="gpt-4", + messages=[ + {"role": "system", "content": "You are a helpful assistant that extracts structured information from text."}, + {"role": "user", "content": prompt} + ] + ) + return json.loads(response.choices[0].message.content) +``` -#### Structured Field Contents +#### Structured Data Fields -| Field | Description | -|-------|-------------| -| `title` | Memory title | -| `overview` | Brief summary | -| `emoji` | Representative emoji | -| `category` | Memory category | -| `action_items` | List of action items | -| `events` | List of related events | +| Field | Description | Example | +|-------|-------------|---------| +| `title` | Concise title of the memory | "Team Meeting on Q2 Goals" | +| `overview` | Brief summary of the memory content | "Discussed product launch and market expansion strategies for Q2" | +| `emoji` | Representative emoji for the memory | "πŸš€" | +| `category` | General category of the memory | "work" | +| `action_items` | List of tasks or follow-ups | ["Finalize product features", "Schedule market research"] | +| `events` | List of calendar events mentioned | [{"title": "Q2 Review", "start_time": "2023-07-01T14:00:00"}] | ### 3. Generate Embedding with OpenAI API -- `generate_embedding` function in `utils/llm.py`: - - Calls OpenAI's Embeddings API - - Passes extracted structured data as text - - OpenAI model processes text and returns numerical vector representation +The `generate_memory_embedding` function in `utils/llm.py` creates the embedding: + +```python +# In utils/llm.py +def generate_memory_embedding(memory: Memory, structured_data: dict) -> List[float]: + # Combine relevant memory data for embedding + embedding_text = f"{memory.transcript}\n\nTitle: {structured_data['title']}\nOverview: {structured_data['overview']}\nCategory: {structured_data['category']}" + + # Generate embedding using OpenAI's API + response = openai.Embedding.create( + input=embedding_text, + model="text-embedding-3-large" + ) + return response['data'][0]['embedding'] +``` + +**Note:** We use the `text-embedding-3-large` model for its superior performance in capturing semantic meaning. ### 4. Create Metadata -- `database/vector_db.py` creates a metadata dictionary: +Metadata is crucial for efficient filtering and retrieval of embeddings. The `upsert_vector` function in `database/vector_db.py` prepares this metadata: -| Field | Description | -|-------|-------------| -| `memory_id` | Unique ID of the memory | -| `uid` | User ID associated with the memory | -| `created_at` | Timestamp of embedding creation | +```python +# In database/vector_db.py +def upsert_vector(uid: str, memory: Memory, vector: List[float]): + metadata = { + 'uid': uid, + 'memory_id': memory.id, + 'created_at': memory.created_at.timestamp(), + 'category': memory.structured['category'], + 'source': memory.source + } + # ... (continue with Pinecone upsert) +``` ### 5. Store in Pinecone -- `database/vector_db.py`: - - Combines embedding vector, metadata, and unique ID - - Sends data point to Pinecone API using upsert operation - - Pinecone stores embedding and metadata in specified index +The embedding vector and metadata are stored in Pinecone, a vector database optimized for similarity search: + +```python +# In database/vector_db.py +def upsert_vector(uid: str, memory: Memory, vector: List[float]): + # ... (metadata creation) + index.upsert(vectors=[{ + "id": f'{uid}-{memory.id}', + "values": vector, + 'metadata': metadata + }], namespace="ns1") +``` + +**Important:** We use a namespace (`"ns1"`) in Pinecone to logically separate vectors, allowing for more efficient querying and management. + +## πŸ” Utilizing Embeddings for Memory Retrieval + +### Semantic Search + +The `query_vectors` function in `database/vector_db.py` performs semantic search using the embeddings: + +```python +# In database/vector_db.py +def query_vectors(query: str, uid: str, k: int = 5) -> List[str]: + # Generate embedding for the query + query_embedding = embeddings.embed_query(query) + + # Prepare filter + filter_data = {'uid': uid} + if starts_at is not None: + filter_data['created_at'] = {'$gte': starts_at, '$lte': ends_at} + + # Perform the query + results = index.query( + vector=query_embedding, + top_k=k, + namespace="ns1", + filter=filter_data + ) + + # Extract and return memory IDs + return [item['id'].split('-')[1] for item in results['matches']] +``` + +This function allows for: +- Semantic similarity search based on the query +- Filtering by user ID to ensure data isolation +- Optional date range filtering +- Returning the top-k most similar memories + +## πŸ” Practical Examples of Embedding Usage + +### 1. Semantic Search for Relevant Memories + +```python +def find_relevant_memories(user_query: str, uid: str, k: int = 5): + query_embedding = generate_memory_embedding(user_query) + relevant_memory_ids = query_vectors(query_embedding, uid, k=k) + return get_memories_by_id(uid, relevant_memory_ids) +``` + +This function allows the retrieval of memories semantically similar to a user's query, enhancing the contextual understanding in conversations. + +### 2. Clustering Similar Memories + +```python +from sklearn.cluster import KMeans + +def cluster_user_memories(uid: str, n_clusters: int = 5): + memories = get_all_user_memories(uid) + embeddings = [memory.embedding for memory in memories] + kmeans = KMeans(n_clusters=n_clusters) + clusters = kmeans.fit_predict(embeddings) + return list(zip(memories, clusters)) +``` + +This function groups similar memories together, which can be used for generating summaries or identifying trends in user experiences. -## 🎯 Why This Matters +### 3. Memory Deduplication -1. **Semantic Search**: Enables Omi to find semantically similar memories when answering user questions -2. **Metadata for Filtering**: Allows efficient filtering of memories by user or time range during retrieval +```python +def find_duplicate_memories(uid: str, similarity_threshold: float = 0.95): + memories = get_all_user_memories(uid) + duplicates = [] + for i, mem1 in enumerate(memories): + for j, mem2 in enumerate(memories[i+1:]): + similarity = cosine_similarity(mem1.embedding, mem2.embedding) + if similarity > similarity_threshold: + duplicates.append((mem1, mem2)) + return duplicates +``` + +This function identifies potentially duplicate memories, helping to maintain a clean and non-redundant memory store. + +## πŸš€ Performance Optimization for Embeddings + +### 1. Batch Processing + +For improved efficiency when dealing with multiple memories: + +```python +def batch_upsert_vectors(uid: str, memories: List[Memory], vectors: List[List[float]]): + batch_size = 100 # Adjust based on Pinecone's recommendations + for i in range(0, len(memories), batch_size): + batch_memories = memories[i:i+batch_size] + batch_vectors = vectors[i:i+batch_size] + + index.upsert( + vectors=[{ + "id": f'{uid}-{memory.id}', + "values": vector, + 'metadata': create_metadata(uid, memory) + } for memory, vector in zip(batch_memories, batch_vectors)], + namespace="ns1" + ) +``` + +### 2. Caching Frequently Accessed Embeddings + +Implement a caching layer for frequently accessed embeddings: + +```python +import redis +from functools import lru_cache + +redis_client = redis.Redis(host='localhost', port=6379, db=0) + +@lru_cache(maxsize=1000) +def get_cached_embedding(memory_id: str): + cached = redis_client.get(f"embedding:{memory_id}") + if cached: + return json.loads(cached) + + embedding = fetch_embedding_from_pinecone(memory_id) + redis_client.setex(f"embedding:{memory_id}", 3600, json.dumps(embedding)) # Cache for 1 hour + return embedding +``` + +### 3. Asynchronous Embedding Generation + +For non-blocking embedding generation: + +```python +import asyncio +from concurrent.futures import ThreadPoolExecutor + +async def generate_embeddings_async(texts: List[str]): + loop = asyncio.get_event_loop() + with ThreadPoolExecutor() as executor: + embeddings = await asyncio.gather( + *[loop.run_in_executor(executor, generate_memory_embedding, text) for text in texts] + ) + return embeddings +``` + +## πŸ’» Complex Operations with Embeddings -## πŸ” Additional Considerations +### 1. Time-Weighted Memory Retrieval -- **Embedding Model**: Uses OpenAI's `text-embedding-3-large` model -- **Index Configuration**: Ensure Pinecone index is configured for the chosen embedding model -- **Retrieval**: `query_vectors` function in `database/vector_db.py` retrieves memory IDs based on query embedding and filter criteria +This function retrieves relevant memories with a bias towards more recent ones: + +```python +import numpy as np + +def time_weighted_memory_retrieval(query: str, uid: str, k: int = 5, time_decay: float = 0.1): + query_embedding = generate_memory_embedding(query) + results = index.query( + vector=query_embedding, + top_k=k*2, # Retrieve more results for post-processing + namespace="ns1", + filter={"uid": uid} + ) + + weighted_results = [] + for item in results['matches']: + memory_id = item['id'].split('-')[1] + memory = get_memory(uid, memory_id) + time_diff = (datetime.now() - memory.created_at).days + time_weight = np.exp(-time_decay * time_diff) + weighted_score = item['score'] * time_weight + weighted_results.append((memory, weighted_score)) + + return sorted(weighted_results, key=lambda x: x[1], reverse=True)[:k] +``` + +### 2. Multi-Query Embedding Search + +This function allows searching with multiple query embeddings and aggregates the results: + +```python +from collections import defaultdict + +def multi_query_search(queries: List[str], uid: str, k: int = 5): + query_embeddings = [generate_memory_embedding(query) for query in queries] + + all_results = defaultdict(float) + for embedding in query_embeddings: + results = index.query( + vector=embedding, + top_k=k, + namespace="ns1", + filter={"uid": uid} + ) + for item in results['matches']: + memory_id = item['id'].split('-')[1] + all_results[memory_id] += item['score'] + + top_memories = sorted(all_results.items(), key=lambda x: x[1], reverse=True)[:k] + return [get_memory(uid, memory_id) for memory_id, _ in top_memories] +``` + +These advanced techniques and optimizations showcase the power and flexibility of using embeddings for memory retrieval and analysis in the Omi backend. + +## 🎯 Why Embeddings Matter + +1. **Semantic Understanding:** Embeddings capture the meaning of memories, not just keywords. +2. **Efficient Retrieval:** Vector similarity search is fast and scalable. +3. **Cross-Lingual Capabilities:** Embeddings can bridge language barriers in memory retrieval. +4. **Contextual Responses:** Omi can provide more relevant and context-aware responses in conversations. + +## πŸ”’ Security and Privacy Considerations + +- Embeddings are stored with user IDs to ensure data isolation. +- Access to Pinecone is restricted and authenticated to prevent unauthorized access. +- Consider implementing encryption-at-rest for the vector database for additional security. + +## πŸš€ Performance Optimization + +1. **Batch Processing:** Use Pinecone's batch upsert for multiple embeddings. +2. **Caching:** Implement a caching layer (e.g., Redis) for frequently accessed embeddings. +3. **Index Optimization:** Regularly monitor and optimize the Pinecone index for query performance. + +## πŸ”„ Continuous Improvement + +- Regularly evaluate and update the embedding model to benefit from advancements in NLP. +- Implement A/B testing to compare different embedding strategies and their impact on retrieval quality. +- Collect user feedback on search results to fine-tune the embedding and retrieval process. ## πŸ’» Key Code Components ```python # In utils/memories/process_memory.py -def process_memory(uid, language_code, memory, force_process=False): - # ... (other processing) - vector = generate_embedding(str(structured)) - upsert_vector(uid, memory, vector) - # ... +async def process_memory(uid: str, processing_memory_id: str): + # ... (previous code) + embedding = generate_memory_embedding(memory, structured_data) + upsert_vector(uid, memory, embedding) # In utils/llm.py -def generate_embedding(content: str) -> List[float]: - return embeddings.embed_documents([content])[0] +def generate_memory_embedding(memory: Memory, structured_data: dict) -> List[float]: + embedding_text = prepare_embedding_text(memory, structured_data) + return openai.Embedding.create(input=embedding_text, model="text-embedding-3-large")['data'][0]['embedding'] # In database/vector_db.py def upsert_vector(uid: str, memory: Memory, vector: List[float]): - # ... (create metadata and upsert to Pinecone) -``` \ No newline at end of file + metadata = prepare_metadata(uid, memory) + index.upsert(vectors=[{"id": f'{uid}-{memory.id}', "values": vector, 'metadata': metadata}], namespace="ns1") + +def query_vectors(query: str, uid: str, k: int = 5) -> List[str]: + query_embedding = embeddings.embed_query(query) + results = index.query(vector=query_embedding, top_k=k, namespace="ns1", filter={"uid": uid}) + return [item['id'].split('-')[1] for item in results['matches']] +``` + +By implementing this comprehensive embedding process, Omi ensures that memories are not just stored, but are made intelligently accessible, enabling rich, context-aware interactions and insights. diff --git a/docs/_get_started/backend/postprocessing.md b/docs/_get_started/backend/postprocessing.md index 4bcede7e8..64e5ccc46 100644 --- a/docs/_get_started/backend/postprocessing.md +++ b/docs/_get_started/backend/postprocessing.md @@ -4,109 +4,373 @@ title: Memory Post-Processing parent: Backend nav_order: 6 --- + # πŸŽ›οΈ Omi Memory Post-Processing Workflow -This document outlines the post-processing workflow for memories in the Omi application. +This document provides a comprehensive overview of the post-processing workflow for memories in the Omi application. It covers the entire process from initial transcription to final storage, including all intermediate steps and key code components. ## πŸ“Š Process Overview -1. Post-processing request initiated -2. Request handled by `routers/postprocessing.py` -3. Audio pre-processed and stored -4. FAL.ai WhisperX transcription performed -5. Transcript post-processed -6. Speech profile matching for speaker identification -7. Memory updated and reprocessed -8. Optional emotional analysis +1. Initial real-time transcription with multiple services +2. Processing memory creation +3. Initial memory processing and storage +4. Post-processing request initiation +5. Request handling and audio preparation +6. High-accuracy transcription with FAL.ai WhisperX +7. Transcript post-processing and segmentation +8. Speech profile matching for speaker identification +9. Memory update and reprocessing +10. Optional emotional analysis +11. Final storage and vector embedding update +12. Enhancing User Experience with Post-Processing Results - ![Post Processing](/images/postprocessing.png) +![Post Processing](/images/postprocessing.png) ## πŸ” Detailed Steps -### 1. Post-Processing Request +### 1. Initial Real-Time Transcription -- Omi App sends POST request to `/v1/memories/{memory_id}/post-processing` -- Request includes: - - Audio recording for post-processing - - Flag for emotional analysis +- Multiple transcription services (Deepgram, Soniox, Speechmatics) process audio in real-time +- Handled by `routers/transcribe.py` through WebSocket connections +- Each service provides its own transcription results -### 2. Request Handling +```python +@router.websocket("/listen") +async def websocket_endpoint(websocket: WebSocket, uid: str, language: str = 'en', ...): + await websocket.accept() + transcript_socket_deepgram = await process_audio_dg(uid, websocket, language, ...) + transcript_socket_soniox = await process_audio_soniox(uid, websocket, language, ...) + transcript_socket_speechmatics = await process_audio_speechmatics(uid, websocket, language, ...) + # Process incoming audio and send transcripts back to the client +``` -- `postprocess_memory` function in `routers/postprocessing.py` processes the request -- Retrieves existing memory data from Firebase Firestore using `database/memories.py` +### 2. Processing Memory Creation + +- `create_processing_memory` function in `database/processing_memories.py` creates a new processing memory +- Stores initial transcription results and metadata + +```python +def create_processing_memory(uid: str, data: dict): + processing_memory_id = str(uuid.uuid4()) + processing_memory_data = { + "id": processing_memory_id, + "created_at": datetime.now(timezone.utc), + "transcript_segments": data["transcript_segments"], + "language": data["language"], + # ... other relevant data + } + upsert_processing_memory(uid, processing_memory_data) + return processing_memory_id +``` + +### 3. Initial Memory Processing and Storage + +- `process_memory` function in `utils/memories/process_memory.py` processes the memory +- Uses OpenAI's LLM to extract structured data (title, overview, etc.) +- Generates initial vector embedding +- Stores processed memory in Firebase Firestore +- Stores embedding in Pinecone vector database + +```python +async def process_memory(uid: str, processing_memory_id: str): + processing_memory = get_processing_memory_by_id(uid, processing_memory_id) + structured_data = await extract_structured_data(processing_memory.transcript) + embedding = generate_memory_embedding(processing_memory) + + memory_data = { + "id": str(uuid.uuid4()), + "created_at": processing_memory.created_at, + "transcript_segments": processing_memory.transcript_segments, + "structured": structured_data, + # ... other memory fields + } + + upsert_memory(uid, memory_data) + upsert_vector(uid, memory_data, embedding) + delete_processing_memory(uid, processing_memory_id) +``` + +### 4. Post-Processing Request Initiation + +- Omi App sends POST request to `/v1/memories/{memory_id}/post-processing` +- Request includes audio recording and optional emotional analysis flag + +```python +@router.post("/v1/memories/{memory_id}/post-processing", response_model=Memory) +async def postprocess_memory( + memory_id: str, + file: UploadFile, + emotional_feedback: bool = False, + background_tasks: BackgroundTasks, + uid: str = Depends(get_current_user_id) +): + # ... (request handling code) +``` -### 3. Pre-Processing and Storage +### 5. Request Handling and Audio Preparation -#### User Permission Check -- Checks if user allows audio storage (`database/users.py`) -- If permitted, audio uploaded to `memories_recordings_bucket` in Google Cloud Storage +- `postprocess_memory` function in `routers/postprocessing.py` processes the request +- Retrieves existing memory data from Firebase Firestore +- Uploads audio for processing to Google Cloud Storage +- Initiates background task for post-processing -#### Audio Upload for Processing -- Audio uploaded to `postprocessing_audio_bucket` in Google Cloud Storage -- Handled by `utils/other/storage.py` +```python +async def postprocess_memory(...): + memory = get_memory(uid, memory_id) + if not memory: + raise HTTPException(status_code=404, detail="Memory not found") -#### Cleanup -- Background thread started to delete uploaded audio after set time (e.g., 5 minutes) + audio_url = await upload_audio_for_processing(file, uid, memory_id) + background_tasks.add_task(run_postprocessing, uid, memory_id, audio_url, emotional_feedback) + return memory +``` -### 4. FAL.ai WhisperX Transcription +### 6. High-Accuracy Transcription with FAL.ai WhisperX - `fal_whisperx` function in `utils/stt/pre_recorded.py` sends audio to FAL.ai - WhisperX model performs high-quality transcription and speaker diarization -- Returns list of transcribed words with speaker labels +- Returns list of transcribed words with speaker labels and timestamps + +```python +async def fal_whisperx(audio_url: str): + result = fal.apps.submit_and_wait( + "110602490-whispercpp", + { + "audio": audio_url, + "language": "en", + "task": "transcribe", + "vad_filter": True, + "word_timestamps": True, + }, + ) + return process_fal_result(result) +``` + +### 7. Transcript Post-Processing and Segmentation -### 5. Transcript Post-Processing +- `fal_postprocessing` function cleans and segments the transcript data +- Groups words into `TranscriptSegment` objects based on speaker and timing -`fal_postprocessing` function in `utils/stt/pre_recorded.py`: -- Cleans transcript data -- Groups words into segments based on speaker and timing -- Converts segments to `TranscriptSegment` objects +```python +def fal_postprocessing(words: List[dict]) -> List[TranscriptSegment]: + segments = [] + current_segment = None + for word in words: + if not current_segment or word['speaker'] != current_segment.speaker: + if current_segment: + segments.append(current_segment) + current_segment = TranscriptSegment( + start=word['start'], + end=word['end'], + text=word['word'], + speaker=word['speaker'] + ) + else: + current_segment.end = word['end'] + current_segment.text += f" {word['word']}" + if current_segment: + segments.append(current_segment) + return segments +``` -### 6. Speech Profile Matching +### 8. Speech Profile Matching for Speaker Identification -`get_speech_profile_matching_predictions` in `utils/stt/speech_profile.py`: -- Downloads user's speech profile and known people profiles -- Uses Speechbrain model to compare speaker embeddings +- `get_speech_profile_matching_predictions` in `utils/stt/speech_profile.py` performs speaker identification +- Compares segment audio with user's speech profile and known people profiles - Updates segments with `is_user` and `person_id` flags -### 7. Memory Update and Reprocessing +```python +def get_speech_profile_matching_predictions(uid: str, segments: List[TranscriptSegment]): + user_profile = get_user_speech_profile(uid) + people_profiles = get_people_with_speech_samples(uid) + + for segment in segments: + scores = { + 'user': sample_same_speaker_as_segment(user_profile, segment.audio), + **{person['id']: sample_same_speaker_as_segment(person['profile'], segment.audio) for person in people_profiles} + } + best_match = max(scores, key=scores.get) + segment.is_user = best_match == 'user' + segment.person_id = None if segment.is_user else best_match + + return segments +``` -- Memory object updated with improved transcript and speaker identification -- Updated data saved to Firebase Firestore -- If FAL.ai transcription successful: - - `process_memory` in `utils/memories/process_memory.py` re-processes memory - - Re-extracts structured data (title, overview, etc.) - - Re-generates embeddings - - Updates memory in vector database +### 9. Memory Update and Reprocessing -### 8. Emotional Analysis (Optional) +- Updates memory object with improved transcript and speaker identification +- Re-extracts structured data and regenerates embeddings +- Updates memory in Firestore and vector embedding in Pinecone + +```python +async def reprocess_memory(uid: str, memory_id: str, new_segments: List[TranscriptSegment]): + memory = get_memory(uid, memory_id) + memory.transcript_segments = new_segments + + structured_data = await extract_structured_data(memory.transcript) + memory.structured = structured_data + + embedding = generate_memory_embedding(memory) + + upsert_memory(uid, memory.dict()) + upsert_vector(uid, memory, embedding) + + return memory +``` -If requested: -- `process_user_emotion` function called asynchronously -- Uses Hume API to analyze user's emotions in the recording -- Can trigger notifications based on detected emotions +### 10. Emotional Analysis (Optional) -## πŸ’» Key Code Components +- `process_user_emotion` function analyzes audio for emotional content using Hume AI +- Emotional data stored alongside memory ```python -# In routers/postprocessing.py -@router.post("/v1/memories/{memory_id}/post-processing", response_model=Memory) -def postprocess_memory(memory_id: str, file: UploadFile, emotional_feedback: bool = False): - # ... (request handling and pre-processing) - words = fal_whisperx(audio_url) - segments = fal_postprocessing(words) - segments = get_speech_profile_matching_predictions(uid, segments) - # ... (memory update and reprocessing) - if emotional_feedback: - asyncio.create_task(process_user_emotion(uid, file_path)) - -# In utils/stt/pre_recorded.py -def fal_whisperx(audio_url: str): - # ... (FAL.ai API call and processing) +async def process_user_emotion(uid: str, file_path: str): + client = HumeStreamClient(os.getenv("HUME_API_KEY")) + config = LanguageConfig(granularity="sentence") + async with client.connect([config]) as socket: + result = await socket.send_file(file_path) + + emotions = extract_emotions_from_result(result) + store_model_emotion_predictions_result(uid, memory_id, "hume", emotions) +``` -def fal_postprocessing(words: List[dict]) -> List[TranscriptSegment]: - # ... (clean and format transcript data) +### 11. Final Storage and Vector Embedding Update -# In utils/stt/speech_profile.py -def get_speech_profile_matching_predictions(uid: str, segments: List[TranscriptSegment]): - # ... (speaker identification logic) -``` \ No newline at end of file +- Final memory data saved to Firebase Firestore +- Updated vector embedding stored in Pinecone for efficient retrieval + +```python +def upsert_memory(uid: str, memory_data: dict): + user_ref = db.collection('users').document(uid) + memory_ref = user_ref.collection('memories').document(memory_data['id']) + memory_ref.set(memory_data) + +def upsert_vector(uid: str, memory: Memory, vector: List[float]): + index.upsert(vectors=[{ + "id": f'{uid}-{memory.id}', + "values": vector, + 'metadata': { + 'uid': uid, + 'memory_id': memory.id, + 'created_at': memory.created_at.timestamp(), + } + }], namespace="ns1") +``` + +### 12. Enhancing User Experience with Post-Processing Results + +1. **Improved Transcript Accuracy:** + - Higher quality transcripts lead to more accurate memory retrieval and analysis. + - Users can rely on the transcripts for important information without manual corrections. + +2. **Enhanced Speaker Identification:** + - Accurate speaker labeling improves conversation context and personalization. + - Enables features like speaker-specific insights and personalized recommendations. + +3. **Emotional Context Awareness:** + - Emotional analysis allows for more empathetic and context-aware responses from Omi. + - Enables tracking of emotional patterns over time for personal growth insights. + +4. **Better Memory Summarization:** + - Improved transcripts and emotional data lead to more accurate and insightful memory summaries. + - Enhances the quality of daily or weekly recap features. + +5. **More Relevant Memory Retrieval:** + - Higher quality transcripts and embeddings improve the accuracy of semantic search. + - Users receive more relevant memories when asking questions or seeking past information. + +## πŸ”„ Error Handling and Retry Mechanisms + +Robust error handling and retry mechanisms are crucial for ensuring reliable post-processing: + +```python +from tenacity import retry, stop_after_attempt, wait_exponential + +@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10)) +async def run_postprocessing(uid: str, memory_id: str, audio_url: str): + try: + # Post-processing logic + pass + except TransientError as e: + logger.warning(f"Transient error during post-processing: {str(e)}") + raise # This will trigger a retry + except PermanentError as e: + logger.error(f"Permanent error during post-processing: {str(e)}") + update_memory_postprocessing_status(uid, memory_id, status="failed", error=str(e)) + # Notify user or support team +``` + +This implementation uses the `tenacity` library for advanced retry logic: +- Retries up to 3 times for transient errors. +- Uses exponential backoff to avoid overwhelming services. +- Distinguishes between transient and permanent errors for appropriate handling. + +## πŸ“Š Performance Metrics and Monitoring + +Implementing comprehensive monitoring ensures optimal performance and quick issue resolution: + +1. **Prometheus Metrics:** + ```python + from prometheus_client import Counter, Histogram + + POSTPROCESSING_DURATION = Histogram('memory_postprocessing_duration_seconds', 'Duration of memory post-processing') + POSTPROCESSING_ERRORS = Counter('memory_postprocessing_errors_total', 'Total post-processing errors') + + async def run_postprocessing(uid: str, memory_id: str, audio_url: str): + with POSTPROCESSING_DURATION.time(): + try: + # Post-processing logic + pass + except Exception as e: + POSTPROCESSING_ERRORS.inc() + raise + ``` + +2. **Logging Key Events:** + ```python + import structlog + + logger = structlog.get_logger() + + async def run_postprocessing(uid: str, memory_id: str, audio_url: str): + logger.info("Starting post-processing", uid=uid, memory_id=memory_id) + # ... processing logic ... + logger.info("Post-processing completed", uid=uid, memory_id=memory_id, duration=duration) + ``` + +3. **Alerting on Critical Issues:** + - Set up alerts for high error rates or prolonged processing times. + - Integrate with incident management systems like PagerDuty for immediate notification. + +4. **Dashboard for Visualization:** + - Create a Grafana dashboard to visualize: + - Post-processing success rates + - Average processing times + - Error rates by type + - Resource utilization during processing + +## πŸ”„ Continuous Improvement Strategies + +To ensure the post-processing system evolves and improves over time: + +1. **A/B Testing Framework:** + - Implement a system to test new post-processing algorithms or configurations. + - Compare results against existing methods for accuracy and performance. + +2. **User Feedback Loop:** + - Collect user feedback on post-processing results and incorporate it into the improvement process. + - Implement a system for users to report issues or provide suggestions for improvement. + +3. **Automated Testing:** + - Develop a suite of automated tests to validate the accuracy and performance of the post-processing system. + - Regularly run these tests to ensure the system remains reliable and up-to-date. + +4. **Continuous Learning:** + - Implement a system to continuously learn and improve the post-processing algorithms based on new data and techniques. + - Regularly update the system to leverage the latest advancements in speech recognition and natural language processing. + +5. **Scalability and Resilience:** + - Design the post-processing system to handle increasing volumes of data and traffic. + - Implement fault-tolerant mechanisms to ensure the system remains available and reliable even under heavy loads. + +By following this detailed post-processing workflow, Omi ensures that each memory is accurately transcribed, enriched with valuable metadata, and optimized for efficient retrieval and use in AI-powered interactions. diff --git a/docs/_get_started/backend/security_measures.md b/docs/_get_started/backend/security_measures.md new file mode 100644 index 000000000..aec065ed9 --- /dev/null +++ b/docs/_get_started/backend/security_measures.md @@ -0,0 +1,143 @@ +--- +layout: default +title: Security Measures +parent: Backend +nav_order: 7 +--- + +# πŸ”’ Current Security Measures in Omi Backend + +This document outlines the security measures currently implemented in the Omi backend to protect user data and maintain system integrity. + +## Table of Contents + +1. [Data Encryption](#data-encryption) +2. [Authentication and Authorization](#authentication-and-authorization) +3. [Database Security](#database-security) +4. [API Security](#api-security) +5. [Secrets Management](#secrets-management) +6. [Logging and Monitoring](#logging-and-monitoring) +7. [Secure Development Practices](#secure-development-practices) + +## Data Encryption + +### In-Transit Encryption +- HTTPS is enforced for all API communications using FastAPI's built-in HTTPS support. +- WebSocket connections for real-time audio streaming are secured with WSS (WebSocket Secure). + +### At-Rest Encryption +- Firestore data is encrypted at rest using Google Cloud's default encryption mechanisms. +- Google Cloud Storage objects (e.g., audio files) are encrypted using Google-managed encryption keys. +- Pinecone vector database is configured to use encryption at rest. + +## Authentication and Authorization + +### User Authentication +- Firebase Authentication is used for secure user sign-up and login. +- JWT tokens are used for API request authentication. + +### Role-Based Access Control (RBAC) +- Firebase Security Rules are implemented to enforce access control in Firestore and Cloud Storage. +- Backend middleware checks user roles before processing requests. + +## Database Security + +### Firestore Security Rules +- Rules ensure users can only read and write their own data. +- Example rule implementation: + + ```javascript + service cloud.firestore { + match /databases/{database}/documents { + match /users/{userId} { + allow read, write: if request.auth.uid == userId; + } + } + } + ``` + +### Pinecone Access Control +- API key authentication is used for Pinecone vector database access. +- Namespace isolation is implemented to separate user data within the vector database. + +## API Security + +### Rate Limiting +- Rate limiting is implemented using the `slowapi` library to prevent abuse. + + ```python + from slowapi import Limiter + from slowapi.util import get_remote_address + + limiter = Limiter(key_func=get_remote_address) + app.state.limiter = limiter + ``` + +### Input Validation +- Pydantic models are used for request data validation. + + ```python + from pydantic import BaseModel, validator + + class UserInput(BaseModel): + username: str + age: int + + @validator('username') + def username_no_spaces(cls, v): + if ' ' in v: + raise ValueError('Username must not contain spaces') + return v + ``` + +### CORS Configuration +- CORS is configured to restrict API access to trusted domains. + + ```python + from fastapi.middleware.cors import CORSMiddleware + + app.add_middleware( + CORSMiddleware, + allow_origins=["https://yourdomain.com"], + allow_credentials=True, + allow_methods=["*"], + allow_headers=["*"], + ) + ``` + +## Secrets Management + +### Environment Variables +- Sensitive configuration (API keys, database credentials) is stored as environment variables. +- For production deployments, Google Cloud Secret Manager is used. + + ```python + from google.cloud import secretmanager + + client = secretmanager.SecretManagerServiceClient() + name = f"projects/{project_id}/secrets/{secret_id}/versions/latest" + response = client.access_secret_version(name=name) + secret = response.payload.data.decode('UTF-8') + ``` + +## Logging and Monitoring + +### Centralized Logging +- Google Cloud Logging is used for centralized log collection and analysis. + +### Performance Monitoring +- Google Cloud Monitoring is used to track system performance and detect anomalies. + +## Secure Development Practices + +### Dependency Management +- Regular updates of dependencies to patch known vulnerabilities. +- Use of `pip` for package management with a `requirements.txt` file. + +### Secure Deployment +- Deployment to Google Cloud Run using Docker containers. +- CI/CD pipeline implemented with GitHub Actions for automated, secure deployments. + +--- + +This security measures document reflects the current implementation in the Omi backend. It's important to regularly review and update these measures as the system evolves and new security best practices emerge. diff --git a/docs/_get_started/backend/transcription.md b/docs/_get_started/backend/transcription.md index 9d6810276..7b54da727 100644 --- a/docs/_get_started/backend/transcription.md +++ b/docs/_get_started/backend/transcription.md @@ -7,9 +7,40 @@ nav_order: 4 # πŸŽ™οΈ Real-Time Transcription Process -This document outlines the real-time audio transcription process in the Omi application. - -![Post Processing](../../images/transcription-process.png) +This document provides a comprehensive overview of the real-time audio transcription process in the Omi application. + +## πŸ“‘ Detailed Transcription Flow + +Here's a detailed look at how the real-time transcription process works: + +```mermaid +sequenceDiagram + participant User + participant OmiApp + participant WebSocket + participant TranscriptionServices + participant VAD + participant SpeakerIdentification + participant MemoryProcessing + + User->>OmiApp: Start recording + OmiApp->>WebSocket: Establish connection (/listen endpoint) + WebSocket->>TranscriptionServices: Initialize Deepgram, Soniox, Speechmatics + WebSocket->>VAD: Initialize Voice Activity Detection + loop Audio Streaming + OmiApp->>WebSocket: Stream audio chunks + WebSocket->>VAD: Perform Voice Activity Detection + VAD->>TranscriptionServices: Forward audio with speech + TranscriptionServices->>WebSocket: Return real-time transcripts + WebSocket->>OmiApp: Send live transcripts + end + User->>OmiApp: Stop recording + OmiApp->>WebSocket: Close connection + WebSocket->>SpeakerIdentification: Process final transcript + SpeakerIdentification->>MemoryProcessing: Provide diarized transcript + MemoryProcessing->>MemoryProcessing: Create processing memory + MemoryProcessing->>OmiApp: Confirm memory creation +``` ## πŸ“‘ Audio Streaming @@ -25,6 +56,13 @@ This document outlines the real-time audio transcription process in the Omi appl - `websocket_endpoint` function sets up the connection - Calls `_websocket_util` function to manage the connection +```python +@router.websocket("/listen") +async def websocket_endpoint(websocket: WebSocket, uid: str, language: str = 'en', ...): + await websocket.accept() + await _websocket_util(websocket, uid, language, ...) +``` + ### `_websocket_util` Function - Accepts the WebSocket connection @@ -35,39 +73,71 @@ This document outlines the real-time audio transcription process in the Omi appl - `receive_audio`: Receives audio chunks and sends to Deepgram - `send_heartbeat`: Sends periodic messages to keep connection alive -## πŸ”Š Deepgram Integration +```python +async def _websocket_util(websocket: WebSocket, uid: str, language: str, ...): + profile_audio = await get_profile_audio_if_exists(uid) + + receive_task = asyncio.create_task(receive_audio(websocket, uid, language, profile_audio, ...)) + heartbeat_task = asyncio.create_task(send_heartbeat(websocket)) + + await asyncio.gather(receive_task, heartbeat_task) +``` + +## πŸ”Š Voice Activity Detection (VAD) + +Before sending audio to transcription services, Omi uses Voice Activity Detection to identify speech segments: + +- Utilizes pyannote.audio's pre-trained VAD model +- Runs on GPU if available, otherwise falls back to CPU +- Filters out non-speech audio to improve transcription accuracy and reduce processing load + +```python +vad = Pipeline.from_pretrained( + "pyannote/voice-activity-detection", + use_auth_token=os.getenv('HUGGINGFACE_TOKEN') +).to(device) + +def process_audio_with_vad(audio_chunk): + vad_result = vad(audio_chunk) + speech_segments = vad_result.get_timeline().support() + return [segment for segment in speech_segments if segment.duration > 0.5] +``` -### `process_audio_dg` Function +## πŸ”Š Multiple Transcription Services Integration -- Located in `utils/stt/streaming.py` +### Deepgram Integration + +- `process_audio_dg` function in `utils/stt/streaming.py` handles Deepgram transcription - Initializes Deepgram client using `DEEPGRAM_API_KEY` - Defines `on_message` callback for handling transcripts - Starts live transcription stream with Deepgram -### Deepgram Configuration - -| Option | Value | Description | -|--------|-------|-------------| -| `language` | Variable | Audio language | -| `sample_rate` | 8000 or 16000 Hz | Audio sample rate | -| `codec` | Opus or Linear16 | Audio codec | -| `channels` | Variable | Number of audio channels | -| `punctuate` | True | Automatic punctuation | -| `no_delay` | True | Low-latency transcription | -| `endpointing` | 100 | Sentence boundary detection | -| `interim_results` | False | Only final transcripts sent | -| `smart_format` | True | Enhanced transcript formatting | -| `profanity_filter` | False | No profanity filtering | -| `diarize` | True | Speaker identification | -| `filler_words` | False | Remove filler words | -| `multichannel` | channels > 1 | Enable if multiple channels | -| `model` | 'nova-2-general' | Deepgram model selection | +```python +async def process_audio_dg(uid: str, websocket: WebSocket, language: str, ...): + client = deepgram.Deepgram(DEEPGRAM_API_KEY) + + async def on_message(result): + # Process and send transcript back to client + await websocket.send_json(process_deepgram_result(result)) + + await client.transcription.live({'language': language, ...}, on_message) +``` + +### Soniox Integration + +- Similar to Deepgram, but uses Soniox API for transcription +- Implemented in `process_audio_soniox` function + +### Speechmatics Integration + +- Uses Speechmatics API for another transcription stream +- Implemented in `process_audio_speechmatics` function ## πŸ”„ Transcript Processing -1. Deepgram processes audio and triggers `on_message` callback -2. `on_message` receives raw transcript data -3. Callback formats transcript data: +1. Each transcription service processes audio and triggers its respective callback +2. Callbacks receive raw transcript data +3. Service-specific processing functions format the transcript data: - Groups words into segments - Creates list of segment dictionaries 4. Formatted segments sent back to Omi App via WebSocket @@ -83,12 +153,78 @@ This document outlines the real-time audio transcription process in the Omi appl | `is_user` | Boolean indicating if segment is from the user | | `person_id` | ID of matched person from user profiles (if applicable) | +## 🎭 Speaker Identification + +After the real-time transcription is complete, speaker identification is performed using SpeechBrain's ECAPA-TDNN model: + +1. Audio is processed using SpeechBrain's pre-trained model +2. Speaker embeddings are generated for each segment +3. Embeddings are compared against user's speech profile and known speakers +4. Each segment is labeled with speaker information + +```python +model = SpeakerRecognition.from_hparams( + source="speechbrain/spkrec-ecapa-voxceleb", + savedir="pretrained_models/spkrec-ecapa-voxceleb", +) + +def get_speech_profile_matching_predictions(uid: str, segments: List[TranscriptSegment]): + user_profile = get_user_speech_profile(uid) + people_profiles = get_people_with_speech_samples(uid) + + for segment in segments: + segment_embedding = model.encode_batch(segment.audio) + scores = { + 'user': cosine_similarity(segment_embedding, user_profile), + **{person['id']: cosine_similarity(segment_embedding, person['profile']) for person in people_profiles} + } + best_match = max(scores, key=scores.get) + segment.is_user = best_match == 'user' + segment.person_id = None if segment.is_user else best_match + + return segments +``` + +## πŸ’Ύ Memory Creation + +After transcription and speaker identification: + +1. A processing memory is created with the transcribed and diarized content +2. The memory is stored in Firestore +3. An embedding is generated and stored in Pinecone for future retrieval + +```python +async def create_processing_memory(uid: str, transcript_segments: List[dict], ...): + memory_id = str(uuid.uuid4()) + memory_data = { + "id": memory_id, + "created_at": datetime.now(timezone.utc), + "transcript_segments": transcript_segments, + # ... other relevant data + } + upsert_processing_memory(uid, memory_data) + return memory_id +``` + +## πŸ” Security and Authentication + +- Firebase Authentication is used for user management +- WebSocket connections are authenticated using user tokens +- All data transmission uses TLS encryption + +## πŸš€ Deployment + +- The transcription service is deployed as part of the main backend on Google Cloud Run +- Scaling is handled automatically based on incoming WebSocket connections + ## πŸ”‘ Key Considerations -- Real-time, low-latency transcription -- Speaker diarization accuracy may vary -- Audio encoding choice (Opus vs. Linear16) may affect performance -- Deepgram model selection based on specific needs -- Implement proper error handling in `on_message` +- Real-time, low-latency transcription is crucial for user experience +- Multiple transcription services are used for improved accuracy and redundancy +- Voice Activity Detection (VAD) improves transcription efficiency and accuracy +- Speaker diarization accuracy may vary depending on audio quality and number of speakers +- SpeechBrain's ECAPA-TDNN model provides robust speaker identification +- Proper error handling and connection management are essential for system stability +- Caching strategies (e.g., Redis) are used to improve performance for frequently accessed data -This overview provides a comprehensive understanding of Omi's real-time transcription process, which can be adapted when integrating alternative audio transcription services. +This comprehensive overview provides a deep understanding of Omi's real-time transcription process, including audio processing, voice activity detection, multiple transcription services, speaker identification, and memory creation. This architecture ensures high-quality, real-time transcription while maintaining flexibility for future improvements or service changes. diff --git a/docs/_get_started/backend/troubleshooting_guide.md b/docs/_get_started/backend/troubleshooting_guide.md new file mode 100644 index 000000000..edf0765cd --- /dev/null +++ b/docs/_get_started/backend/troubleshooting_guide.md @@ -0,0 +1,277 @@ +--- +layout: default +title: Troubleshooting Guide +parent: Backend +nav_order: 8 +--- + +# πŸ”§ Troubleshooting Guide for Omi Backend + +This guide provides solutions to common issues, answers to frequently asked questions, and strategies for diagnosing and resolving problems in the Omi backend system. + +## Table of Contents + +1. [Installation and Setup Issues](#installation-and-setup-issues) +2. [Authentication and Authorization Problems](#authentication-and-authorization-problems) +3. [Database Connection Issues](#database-connection-issues) +4. [API and WebSocket Errors](#api-and-websocket-errors) +5. [Transcription Service Problems](#transcription-service-problems) +6. [Memory Processing and Storage Issues](#memory-processing-and-storage-issues) +7. [Performance and Scaling Challenges](#performance-and-scaling-challenges) +8. [Deployment and Environment-Specific Problems](#deployment-and-environment-specific-problems) +9. [Security Concerns](#security-concerns) +10. [Debugging Strategies](#debugging-strategies) +11. [Frequently Asked Questions (FAQs)](#frequently-asked-questions-faqs) + +## Installation and Setup Issues + +### Q: I'm getting "Module not found" errors when running the backend. + +A: This is usually due to missing dependencies. Try the following: + +1. Ensure you're in the correct virtual environment. +2. Update pip: `pip install --upgrade pip` +3. Reinstall requirements: `pip install -r requirements.txt --no-cache-dir` +4. If a specific module is causing issues, try installing it separately: `pip install ` + +### Q: The backend fails to start with a "Port already in use" error. + +A: Another process might be using the required port. Try: + +1. Identify the process using the port: `lsof -i :` +2. Kill the process: `kill -9 ` +3. If it persists, try changing the port in your configuration. + +### Q: I'm having issues setting up Google Cloud credentials. + +A: Ensure you've followed these steps: + +1. Verify you have the correct `google-credentials.json` file. +2. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable: + ``` + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/google-credentials.json" + ``` +3. If using a service account, ensure it has the necessary permissions in Google Cloud Console. + +## Authentication and Authorization Problems + +### Q: Users are getting "Unauthorized" errors when trying to access the API. + +A: This could be due to several reasons: + +1. Check if the Firebase project is correctly set up and linked. +2. Verify that the user's token is being correctly sent in the Authorization header. +3. Ensure the token hasn't expired. +4. Check Firebase security rules to ensure they're not overly restrictive. + +### Q: How can I debug Firebase Authentication issues? + +A: Try the following: + +1. Enable Firebase Authentication debug mode in your code: + ```python + firebase_admin.initialize_app(credentials, {'debugMode': True}) + ``` +2. Check Firebase Console for any error messages or invalid login attempts. +3. Verify that the Firebase configuration in your `.env` file is correct. + +## Database Connection Issues + +### Q: I'm getting "Connection refused" errors with Firestore. + +A: This could be due to network issues or incorrect configuration: + +1. Check your internet connection. +2. Verify that your Firestore database exists and is properly set up in Google Cloud Console. +3. Ensure your service account has the necessary permissions to access Firestore. +4. Check if there are any outages reported in the Google Cloud Status Dashboard. + +### Q: Pinecone operations are failing with authentication errors. + +A: Verify your Pinecone setup: + +1. Check if the `PINECONE_API_KEY` in your `.env` file is correct. +2. Ensure you're using the correct Pinecone environment and index name. +3. Verify that your Pinecone plan supports the operations you're trying to perform. + +## API and WebSocket Errors + +### Q: WebSocket connections are frequently disconnecting. + +A: This could be due to various reasons: + +1. Check if the client is sending regular heartbeat messages to keep the connection alive. +2. Increase the WebSocket timeout settings on both client and server sides. +3. Verify that there are no network issues or firewalls blocking WebSocket traffic. +4. Check server logs for any errors that might be causing the disconnections. + +### Q: API requests are timing out. + +A: This could be due to performance issues or network problems: + +1. Check the server logs for any long-running operations that might be causing delays. +2. Monitor server resource usage (CPU, memory) to ensure it's not overloaded. +3. Verify that all external service calls (e.g., to OpenAI, Deepgram) are properly timing out and not blocking the API. +4. Consider implementing caching for frequently accessed data to improve response times. + +## Transcription Service Problems + +### Q: Transcription accuracy is poor or inconsistent. + +A: Try the following: + +1. Check the audio quality being sent to the transcription services. +2. Verify that the correct language model is being used for the input audio. +3. Ensure that the VAD (Voice Activity Detection) is properly filtering out non-speech audio. +4. Consider adjusting the confidence threshold for accepting transcription results. + +### Q: One of the transcription services (Deepgram, Soniox, Speechmatics) is consistently failing. + +A: Troubleshoot the specific service: + +1. Check if the API key for the service is correct and not expired. +2. Verify that the service is operational by checking its status page. +3. Try sending a test request directly to the service API to isolate the issue. +4. Review the service's documentation for any recent changes or known issues. + +## Memory Processing and Storage Issues + +### Q: Memory embeddings are not being generated correctly. + +A: This could be due to issues with the OpenAI API or the embedding process: + +1. Check if the OpenAI API key is correct and has sufficient quota. +2. Verify that the embedding model (e.g., "text-embedding-3-large") is available and properly specified. +3. Ensure that the input text for embedding generation is properly formatted and not too long. +4. Check for any errors in the embedding generation process in the logs. + +### Q: Stored memories are missing data or have incorrect information. + +A: This could be due to issues in the memory processing pipeline: + +1. Review the memory processing logic in `utils/memories/process_memory.py` for any bugs. +2. Check if all required fields are being properly extracted and stored. +3. Verify that the structured data extraction from OpenAI is working correctly. +4. Ensure that the Firestore write operations are successful and not being interrupted. + +## Performance and Scaling Challenges + +### Q: The backend is slow to respond during high traffic periods. + +A: Consider the following optimizations: + +1. Implement caching for frequently accessed data using Redis. +2. Optimize database queries and indexes in Firestore. +3. Use asynchronous processing for time-consuming tasks. +4. Consider scaling up your Google Cloud Run instances or implementing auto-scaling. + +### Q: Memory retrieval is becoming slow as the number of memories increases. + +A: Optimize your vector search process: + +1. Ensure you're using efficient filtering in Pinecone queries. +2. Implement pagination for large result sets. +3. Consider using approximate nearest neighbor search instead of exact search for larger datasets. +4. Optimize your embedding model or quantize embeddings to reduce dimensionality. + +## Deployment and Environment-Specific Problems + +### Q: The backend works locally but fails when deployed to Google Cloud Run. + +A: This could be due to environment differences: + +1. Ensure all environment variables are correctly set in Google Cloud Run. +2. Check if all required services (Firestore, Pinecone, etc.) are accessible from Google Cloud Run. +3. Review the Cloud Run logs for any specific error messages. +4. Verify that the Dockerfile is correctly configured and all dependencies are included. + +### Q: How can I debug issues in the production environment? + +A: Use the following strategies: + +1. Enable detailed logging in your production environment. +2. Use Google Cloud's Error Reporting and Logging services to monitor issues. +3. Implement feature flags to easily enable/disable certain functionalities for debugging. +4. Consider setting up a staging environment that mirrors production for testing. + +## Security Concerns + +### Q: How can I ensure that user data is properly isolated and secured? + +A: Implement the following security measures: + +1. Use Firebase Security Rules to restrict data access based on user authentication. +2. Implement proper input validation and sanitization for all API endpoints. +3. Use encryption for sensitive data both in transit and at rest. +4. Regularly audit and rotate API keys and other secrets. + +### Q: I'm concerned about potential vulnerabilities in dependencies. + +A: Address dependency security: + +1. Regularly update dependencies to their latest secure versions. +2. Use tools like `safety` to check for known vulnerabilities in Python packages. +3. Implement a process for reviewing and approving dependency updates. +4. Consider using a dependency scanning tool in your CI/CD pipeline. + +## Debugging Strategies + +### General Debugging Tips + +1. **Enable Verbose Logging**: Temporarily increase log levels to get more detailed information. +2. **Use Debuggers**: Utilize pdb or IDE debuggers to step through code execution. +3. **Isolate the Problem**: Try to reproduce the issue in a minimal, isolated environment. +4. **Check Recent Changes**: Review recent code changes that might have introduced the issue. + +### Debugging Specific Components + +1. **WebSocket Issues**: Use browser developer tools to inspect WebSocket traffic. +2. **Database Problems**: Use database admin consoles to directly query and verify data. +3. **API Errors**: Use tools like Postman to test API endpoints independently. + +## Frequently Asked Questions (FAQs) + +### Q: How can I optimize the performance of the transcription process? + +A: Consider the following: +1. Use efficient audio encoding (e.g., Opus) to reduce bandwidth usage. +2. Implement client-side VAD to reduce the amount of audio sent for transcription. +3. Fine-tune the balance between real-time responsiveness and transcription accuracy. + +### Q: What should I do if I suspect a memory leak in the backend? + +A: Follow these steps: +1. Use memory profiling tools like `memory_profiler` to identify the source of the leak. +2. Check for any resources (e.g., database connections, file handles) that aren't being properly closed. +3. Review your code for any large objects that are being unnecessarily retained in memory. + +### Q: How can I troubleshoot issues with the Modal serverless deployment? + +A: Try the following: +1. Use Modal's built-in logging and monitoring tools to identify issues. +2. Ensure all required environment variables and secrets are properly set in Modal. +3. Test your functions locally using Modal's local development features before deployment. + +### Q: What steps should I take if I suspect a security breach? + +A: Follow this protocol: +1. Immediately revoke and rotate all potentially compromised API keys and secrets. +2. Review access logs and audit trails to identify the extent of the breach. +3. Temporarily disable affected services or endpoints if necessary. +4. Conduct a thorough security audit and implement any necessary additional security measures. + +### Q: How can I improve the accuracy of speaker identification? + +A: Consider these approaches: +1. Collect more diverse speech samples for each known speaker. +2. Experiment with different speaker recognition models or fine-tune the existing model. +3. Implement a confidence threshold for speaker identification to reduce false positives. + +### Q: What should I do if the emotional analysis results seem inaccurate? + +A: Try the following: +1. Verify that the audio quality is sufficient for accurate emotional analysis. +2. Check if the Hume AI API is being used correctly and with appropriate parameters. +3. Consider collecting user feedback on emotional analysis results to improve the system over time. + +Remember, troubleshooting is often an iterative process. Start with the most likely causes and work your way through more complex possibilities. Don't hesitate to reach out to the Omi community or support channels for assistance with particularly challenging issues.