This repo reads a Teams transcript file (currently .docx), chunks it into smaller parts, prompts an Open AI model to process each chunk, and then generates a final summary of the entire document using Azure Open AI.
The diagram below shows the current flow:
- Python 3.12
- Azure Open AI subscription
- pip (Python package installer)
- Clone the repository
git clone https://github.com/hmcts/microsoft-teams-transcript-summariser-python-openai.git
cd microsoft-teams-transcript-summariser-python-openai
- Create a virtual environment
python3 -m venv venv
source vnv/bin/activate
python -m venv venv
venv\Scripts\activate
- Install the required packages
pip install -r requirements.txt
- Create a .env file in the root directory and add the following environment variables
FILE_PATH="<file_path>.docx"
AZURE_OPENAI_API_KEY="azure_openai_api_key"
AZURE_OPENAI_ENDPOINT="azure_openai_endpoint"
AZURE_OPENAI_DEPLOYMENT_NAME="azure_openai_deployment_name"
# prompts from prompt.json
INITIAL_PROMPT_SUMMARY="initial_prompt_summary_per_user_questions_answered"
FINAL_PROMPT_SUMMARY="final_prompt_summary_per_user_questions_answered"
Run the following command to generate a summary of the Teams transcript
python summary.py
- Read and Chunk Document: The script reads a word doc and splits it into smaller chunks.
- Prompt AI Model: Each chunk is processed by an AI model.
- Save Responses: The responses from the AI model are saved to
responses.txt
- Generate Final Summary: The script reads the responses and prompts the AI model to generate a final summary, which is saved to
final_summary.txt
The script uses prompts to guide the AI model on what to generate. The prompts are stored in prompt.json
and are read by the script.
Currently, it has three sets of prompts that reference 5 topical areas, which are: Overview, Concerns, Current Model, Questions/Thoughts to Address, To-Do Lists, and Outstanding Tasks
- Overall Summary:
- Used to generate a general summary of the Teams transcript
- Prompts:
initial_prompt_summary
andfinal_prompt_summary
- Overall Summary Per User:
- Generates a summary that includes references to each user and their respective timestamps within the Teams transcript.
- Prompts:
initial_prompt_summary_per_user
andfinal_prompt_summary_per_user
- Overall Summary Per User with Questions Answered:
- Creates a summary that includes references to each user, their timestamps, and the questions answered during the meeting.
- Prompts:
initial_prompt_summary_per_user_questions_answered
andfinal_prompt_summary_per_user_questions_answered
Example output of each summary: (Formatted raw .txt files for markdown)
The script uses Python's built-in logging module to log information and errors. Logs are printed to the console.
2024-08-26 13:26:05,015 - INFO - Starting to chunk the document kt11.docx
2024-08-26 13:26:19,715 - INFO - HTTP Request: POST https://tamopsuks.openai.azure.com/openai/deployments/gpt4/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"
2024-08-26 13:27:06,973 - INFO - HTTP Request: POST https://tamopsuks.openai.azure.com/openai/deployments/gpt4/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"
2024-08-26 13:27:19,156 - INFO - HTTP Request: POST https://tamopsuks.openai.azure.com/openai/deployments/gpt4/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"
2024-08-26 13:27:35,754 - INFO - HTTP Request: POST https://tamopsuks.openai.azure.com/openai/deployments/gpt4/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"
2024-08-26 13:27:35,755 - INFO - Responses written to responses.txt
2024-08-26 13:27:35,755 - INFO - Finished chunking the document kt11.docx
2024-08-26 13:28:33,572 - INFO - HTTP Request: POST https://tamopsuks.openai.azure.com/openai/deployments/gpt4/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"
2024-08-26 13:28:33,573 - INFO - Final Summary has been created
2024-08-26 13:28:33,574 - INFO - Final summary written to final_summary.txt
2024-08-26 13:28:33,575 - INFO - Script execution time: 148.56 seconds
The script includes error handling for:
- Reading and writing files
- Prompting the AI model