From 015cef36143aabaf285624f9dc1bfb0195094fb5 Mon Sep 17 00:00:00 2001 From: Guillaume Marchand Date: Mon, 25 Nov 2024 09:26:12 +0100 Subject: [PATCH] version 25-11-2024 --- CHANGELOG.md | 9 +++ README.md | 206 +++++++++++++++++++++++++++++---------------------- Taskfile.yml | 2 +- 3 files changed, 126 insertions(+), 91 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 38a9177..2b6f082 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,15 @@ All notable changes to this project will be documented in this file. +## version v1.0.0 + +### Changed + +- CDK and application code refactored + 1. Modular architecture with clear separation of concerns + 2. Reusable components and shared utilities + 3. Flexible configuration management + ## version v0.0.8 ### Added diff --git a/README.md b/README.md index e021ddf..a5bb467 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,11 @@ _Blog post : _ +## Table of Contents - [Create a managed FFmpeg workflow for your media jobs using AWS Batch](#create-a-managed-ffmpeg-workflow-for-your-media-jobs-using-aws-batch) + - [Table of Contents](#table-of-contents) - [Introduction](#introduction) - [Disclaimer And Data Privacy Notice](#disclaimer-and-data-privacy-notice) - [Architecture](#architecture) @@ -19,6 +21,7 @@ _Blog post : @@ -40,31 +43,29 @@ AWS proposes several general usage instance families, optimised compute instance - EC2 instances powered by **AMD**: M6a instances are powered by 3rd generation AMD EPYC processors (code named Milan). - Serverless compute with **Fargate**: Fargate allows to have a completely serverless architecture for your batch jobs. With Fargate, every job receives the exact amount of CPU and memory that it requests. -We are going to create a managed file-based encoding pipeline with [AWS Batch](https://aws.amazon.com/batch) and FFmpeg in containers. - ## Disclaimer And Data Privacy Notice When you deploy this solution, scripts will download different packages with different licenses from various sources. These sources are not controlled by the developer of this script. Additionally, this script can create a non-free and un-redistributable binary. By deploying and using this solution, you are fully aware of this. ## Architecture -The architecture includes 5 main components : +The architecture includes 5 main components: -1. Containers images are stored in a Amazon ECR (Elastic Container Registry) registry. Each container includes FFmpeg library with a Python wrapper. Container images are specialized per CPU architecture : ARM64, x86-64, NVIDIA, and Xilinx. -1. AWS Batch is configured with a queue and compute environment per CPU architecture. AWS Batch schedules job queues using Spot Instance compute environments only, to optimize cost. -1. Customers submit jobs through AWS SDKs with the `SubmitJob` operation or use the Amazon API Gateway REST API to easily submit a job with any HTTP library. -1. All media assets ingested and produced are stored on an Amazon S3 bucket. -1. [Amazon FSx for Lustre](https://aws.amazon.com/fr/fsx/lustre/) seamlessly integrates with Amazon S3, enabling transparent access to S3 objects as files. Amazon FSx for Lustre is ideally suited for temporary storage and short-term data processing due to its configuration as a Scratch file system. This eliminates the need to move large media assets to local storage. -1. Observability is managed by Amazon Cloudwatch and AWS X-Ray. All XRay traces are exported on Amazon S3 to benchmark which compute architecture is better for a specific FFmpeg command. -1. [Amazon Step Functions](https://aws.amazon.com/step-functions/) reliably processes huge volumes of media assets with FFmpeg on AWS Batch. it handles job failures, and AWS service limits. +1. Containers images are stored in a Amazon ECR (Elastic Container Registry) registry. Each container includes FFmpeg library with a Python wrapper. Container images are specialized per CPU architecture: ARM64, x86-64, NVIDIA, and Xilinx. +2. AWS Batch is configured with a queue and compute environment per CPU architecture. AWS Batch schedules job queues using Spot Instance compute environments only, to optimize cost. +3. Customers submit jobs through AWS SDKs with the `SubmitJob` operation or use the Amazon API Gateway REST API to easily submit a job with any HTTP library. +4. All media assets ingested and produced are stored on an Amazon S3 bucket. +5. [Amazon FSx for Lustre](https://aws.amazon.com/fr/fsx/lustre/) seamlessly integrates with Amazon S3, enabling transparent access to S3 objects as files. Amazon FSx for Lustre is ideally suited for temporary storage and short-term data processing due to its configuration as a Scratch file system. This eliminates the need to move large media assets to local storage. +6. Observability is managed by Amazon Cloudwatch and AWS X-Ray. All XRay traces are exported on Amazon S3 to benchmark which compute architecture is better for a specific FFmpeg command. +7. [Amazon Step Functions](https://aws.amazon.com/step-functions/) reliably processes huge volumes of media assets with FFmpeg on AWS Batch. It handles job failures, and AWS service limits. ### Architecture Decision Records 1. [Implement Athena Views](doc/architecture/0001-implement-athena-views.md) -1. [Implement automatic list of instance types per AWS Region](doc/architecture/0002-implement-automatic-list-of-instance-types-per-aws-region.md) -1. [Rollback automatic list of instance types per AWS Region](doc/architecture/0003-rollback-automatic-list-of-instance-types-per-aws-region.md) -1. [Implement Step Functions Dynamic Map](doc/architecture/0004-implement-step-functions-dynamic-map.md) -1. [Implement FSx Lustre Scratch cluster](doc/architecture/0005-implement-fsx-lustre-scratch-cluster.md) +2. [Implement automatic list of instance types per AWS Region](doc/architecture/0002-implement-automatic-list-of-instance-types-per-aws-region.md) +3. [Rollback automatic list of instance types per AWS Region](doc/architecture/0003-rollback-automatic-list-of-instance-types-per-aws-region.md) +4. [Implement Step Functions Dynamic Map](doc/architecture/0004-implement-step-functions-dynamic-map.md) +5. [Implement FSx Lustre Scratch cluster](doc/architecture/0005-implement-fsx-lustre-scratch-cluster.md) ### Diagram @@ -74,65 +75,66 @@ The architecture includes 5 main components : ### Prerequisites -You need the following prerequisites to set up the solution : +You need the following prerequisites to set up the solution: - An AWS account -- Latest version of [AWS Cloud Development Kit (CDK)](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) with a [bootstraping](https://docs.aws.amazon.com/cdk/v2/guide/bootstrapping.html) already done. +- Latest version of [AWS Cloud Development Kit (CDK)](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) with a [bootstrapping](https://docs.aws.amazon.com/cdk/v2/guide/bootstrapping.html) already done - Latest version of [Task](https://taskfile.dev/#/installation) - Latest version of [Docker](https://docs.docker.com/get-docker/) -- Last version of [Python 3](https://www.python.org/downloads/) +- Latest version of [Python 3](https://www.python.org/downloads/) ## Deploy the solution with AWS CDK To deploy the solution on your account, complete the following steps: 1. Clone the github repository -1. execute this list of command : +2. Execute this list of commands: ```bash -task venv +task setup source .venv/bin/activate task cdk:deploy task env -task app:docker-amd64 -task app:docker-arm64 -task app:docker-nvidia -task:app:docker-xilinx +task app:docker:login +task app:docker:build:amd64 +task app:docker:build:arm64 +task app:docker:build:nvidia +task app:docker:build:xilinx ``` CDK will output the new Amazon S3 bucket and the Amazon API Gateway REST endpoint. ## Use the solution -I can execute FFmpeg commands with the **AWS SDKs**, AWS CLI or HTTP REST API. The solution respects the typical syntax of the FFmpeg command described in the [official documentation](https://FFmpeg.org/FFmpeg.html): +The solution supports FFmpeg commands through AWS SDKs, AWS CLI, or HTTP REST API. It follows the typical FFmpeg command syntax from the [official documentation](https://ffmpeg.org/ffmpeg.html): ```bash ffmpeg [global_options] {[input_file_options] -i input_url} ... {[output_file_options] output_url} ... ``` -So, parameters of the solution are +Parameters: -- `global_options`: FFmpeg global options described in the official documentation. -- `input_file_options`: FFmpeg input file options described in the official documentation. -- `ìnput_url`: AWS S3 url synced to the local storage and tranformed to local path by the solution. -- `output_file_options`: FFmpeg output file options described in the official documentation. -- `output_url`: AWS S3 url synced from the local storage to AWS S3 storage. -- `compute`: Instances family used to compute the media asset : `intel`, `arm`, `amd`, `nvidia`, `fargate`, `fargate-arm`, `xilinx` -- `name`: metadata of this job for observability. +- `global_options`: FFmpeg global options described in the official documentation +- `input_file_options`: FFmpeg input file options described in the official documentation +- `input_url`: AWS S3 url synced to the local storage and transformed to local path by the solution +- `output_file_options`: FFmpeg output file options described in the official documentation +- `output_url`: AWS S3 url synced from the local storage to AWS S3 storage +- `compute`: Instances family used to compute the media asset: `intel`, `arm`, `amd`, `nvidia`, `fargate`, `fargate-arm`, `xilinx` +- `name`: metadata of this job for observability -The solution has different FFmpeg versions per AWS EC2 instance families. +Available FFmpeg versions per compute environment: | **Compute** | **FFmpeg version per default** | **FFmpeg version(s) available** | |-------------|--------------------------------|---------------------------------| -| intel | 7.0.1 | 6.0, 5.1 | -| arm | 7.0.1 | 6.0, 5.1 | -| amd | 7.0.1 | 6.0, 5.1 | -| nvidia | 7.0 (snapshot) | 6.0, 5.1 | -| fargate | 7.0.1 | 6.0, 5.1 | -| fargate-arm | 7.0.1 | 6.0, 5.1 | -| xilinx | 4.4 | 4.4 | +| intel | 7.0.1 | 6.0, 5.1 | +| arm | 7.0.1 | 6.0, 5.1 | +| amd | 7.0.1 | 6.0, 5.1 | +| nvidia | 7.0 (snapshot) | 6.0, 5.1 | +| fargate | 7.0.1 | 6.0, 5.1 | +| fargate-arm | 7.0.1 | 6.0, 5.1 | +| xilinx | 4.4 | 4.4 | -In this example we use the AWS SDK "Boto3" (Python) and I want to cut a specific part of a video. First of all, I uploaded a video in the Amazon S3 bucket created by the solution, and complete the parameters below : +Example using AWS SDK (Python): ```python import boto3 @@ -140,29 +142,24 @@ import requests from urllib.parse import urlparse from aws_requests_auth.boto_utils import BotoAWSRequestsAuth -# Cloudformation output of the Amazon S3 bucket created by the solution : s3://batch-FFmpeg-stack-bucketxxxx/ +# Cloudformation output of the Amazon S3 bucket created by the solution: s3://batch-FFmpeg-stack-bucketxxxx/ s3_bucket_url = "" -# Amazon S3 key of the media Asset uploaded on S3 bucket, to compute by FFmpeg command : test/myvideo.mp4 +# Amazon S3 key of the input media asset: test/myvideo.mp4 s3_key_input = "" -# Amazon S3 key of the result of FFmpeg Command : test/output.mp4 +# Amazon S3 key for the output: test/output.mp4 s3_key_output = "" -# EC2 instance family : `intel`, `arm`, `amd`, `nvidia`, `fargate`, `xilinx` +# EC2 instance family: `intel`, `arm`, `amd`, `nvidia`, `fargate`, `xilinx` compute = "intel" job_name = "clip-video" -command={ +command = { "name": job_name, - #"global_options": "", - "input_url" : s3_bucket_url + s3_key_input, - #"input_file_options" : "", - "output_url" : s3_bucket_url + s3_key_output, + "input_url": s3_bucket_url + s3_key_input, + "output_url": s3_bucket_url + s3_key_output, "output_file_options": "-ss 00:00:10 -t 00:00:15 -c:v copy -c:a copy" } -``` - -I submit the FFmpeg command with the AWS SDK Boto3 (Python) : -```python +# Submit job using AWS SDK batch = boto3.client("batch") result = batch.submit_job( jobName=job_name, @@ -172,24 +169,24 @@ result = batch.submit_job( ) ``` -I can also submit the same FFmpeg command with the REST API through a HTTP POST method ([API Documentation](doc/api.md)). I control access to this Amazon API Gateway REST API with [IAM permissions](https://docs.aws.amazon.com/apigateway/latest/developerguide/permissions.html) : +You can also use the REST API: ```python -# AWS Signature Version 4 Signing process with Python Requests +# AWS Signature Version 4 Signing process def apig_iam_auth(rest_api_url): domain = urlparse(rest_api_url).netloc auth = BotoAWSRequestsAuth( aws_host=domain, aws_region="", aws_service="execute-api" ) return auth -# Cloudformation output of the Amazon API Gateway REST API created by the solution : https://xxxx.execute-api.xx-west-1.amazonaws.com/prod/ + api_endpoint = "" auth = apig_iam_auth(api_endpoint) -url= api_endpoint + 'batch/execute/' + compute +url = api_endpoint + 'batch/execute/' + compute response = requests.post(url=url, json=command, auth=auth, timeout=2) ``` -Per default, AWS Batch chooses by itself an EC2 instance type available. If I want to override it, I can add the `nodeOverride` property when I submit a job with the SDK: +To specify an instance type: ```python instance_type = 'c5.large' @@ -199,19 +196,19 @@ result = batch.submit_job( jobDefinition="batch-ffmpeg-job-definition-" + compute, parameters=command, nodeOverrides={ - "nodePropertyOverrides": [ - { - "targetNodes": "0,n", - "containerOverrides": { - "instanceType": instance_type, - }, + "nodePropertyOverrides": [ + { + "targetNodes": "0,n", + "containerOverrides": { + "instanceType": instance_type, }, - ] - }, - ) + }, + ] + }, +) ``` -I can have the status of the AWS Batch job execution with the AWS API [Batch::DescribeJobs](https://docs.aws.amazon.com/batch/latest/APIReference/API_DescribeJobs.html) and with the HTTP REST API ([API Documentation](doc/api.md)): +To have the status of the AWS Batch job execution with the AWS API [Batch::DescribeJobs](https://docs.aws.amazon.com/batch/latest/APIReference/API_DescribeJobs.html) and with the HTTP REST API ([API Documentation](doc/api.md)): ```python command['instance_type'] = instance_type @@ -221,11 +218,11 @@ response = requests.post(url=url, json=command, auth=auth, timeout=2) ### Use the solution at scale with AWS Step Functions -I can process a full library (100000's) of media assets on AWS S3, thanks to AWS Step Functions. I can execute FFmpeg commands at scale with the AWS SDKs, the AWS Command Line Interface (AWS CLI) and the Amazon API Gateway REST API ([API Documentation](doc/api.md)). +Process large volumes of media assets using AWS Step Functions. The workflow can handle hundreds of thousands of files efficiently. ![Step Functions](doc/step_functions.png) -In this example, we use the AWS CLI. A Step Functions execution receives a JSON text as input and passes that input to the first state in the workflow. Here is the JSON input.json designed for the solution: +Example using AWS CLI: ```json { @@ -259,25 +256,28 @@ Parameters of this `input.json are: - `$.output.file_options`: FFmpeg output file options described in the official documentation. - `$.global.options`: FFmpeg global options described in the official documentation. -And, I submit this FFmpeg command described in JSON input file with the AWS CLI : +Submit this FFmpeg command described in JSON input file with the AWS CLI : ```bash -aws stepfunctions start-execution --state-machine-arn arn:aws:states:::stateMachine:batch-ffmpeg-state-machine --name batch-ffmpeg-execution --input "$(jq -R . input.json --raw-output)"“ +aws stepfunctions start-execution \ + --state-machine-arn arn:aws:states:::stateMachine:batch-ffmpeg-state-machine \ + --name batch-ffmpeg-execution \ + --input "$(jq -R . input.json --raw-output)" ``` The Amazon S3 url of the processed media is: `s3://{$.output.s3_bucket}{$.output.s3_suffix}{Input S3 object key}{$.output.s3_suffix}` ### Use the solution with Amazon FSx for Lustre cluster -To efficiently process large media files, avoid spending compute time uploading and downloading media to temporary storage. Instead, use an Amazon FSx for Lustre file system in Scratch mode with Amazon S3 object storage. This provides a cost-effective, durable, and flexible solution. - -Before deploying the CDK stack, I enable the deployment of this feature and configure the storage capacity of the cluster in the `/cdk.json` file. +For efficient processing of large media files, the solution supports Amazon FSx for Lustre integration. Enable this feature in `/cdk.json`: ```json +{ "batch-ffmpeg:lustre-fs": { "enable": true, "storage_capacity_gi_b": 1200 } +} ``` The FFmpeg wrapper transparently converts S3 URLs to lustre filesystem requests when enabled. The integration requires no code changes. @@ -290,36 +290,62 @@ Lustre filesystem file manipulation (preload and release) occurs through the Ama The solution deployed an AWS System Manager Document `batch-ffmpeg-lustre-preload` which preloads a media asset in the Lustre filesystem. This SSM Document is available through the Amazon API Gateway Rest API ([API Documentation](doc/api.md)). -To release files on the FSx for Lustre filesystem, I use the AWS API [Amazon FSx::CreateDataRepositoryTask](https://docs.aws.amazon.com/fsx/latest/APIReference/API_CreateDataRepositoryTask.html) with the type of data repository task `RELEASE_DATA_FROM_FILESYSTEM` or the Amazon API Gateway Rest API ([API Documentation](doc/api.md)). +To release files on the FSx for Lustre filesystem, use the AWS API [Amazon FSx::CreateDataRepositoryTask](https://docs.aws.amazon.com/fsx/latest/APIReference/API_CreateDataRepositoryTask.html) with the type of data repository task `RELEASE_DATA_FROM_FILESYSTEM` or the Amazon API Gateway Rest API ([API Documentation](doc/api.md)). ### Extend the solution -I can customize and extend the solution as I want. For example I can customize the FFmpeg docker image adding libraries or upgrading the FFmpeg version, all docker files are located in [`application/docker-images/`](https://github.com/aws-samples/aws-batch-with-FFmpeg/tree/main/application/docker-images/). +The solution is highly customizable: -The FFmpeg wrapper is a Python script `/application/FFmpeg_wrapper.py` which syncs the source media assets from the Amazon S3 bucket, launches the FFmpeg command and syncs the result to the Amazon S3 bucket. - -The CDK stack is described in the directory `/cdk`. +- Customize FFmpeg docker images in [`src/docker-images/`](https://github.com/aws-samples/aws-batch-with-FFmpeg/tree/main/src/docker-images/) +- Modify the FFmpeg wrapper in `/src/wrapper/wrapper.py` +- Extend the CDK infrastructure in `/infrastructure` ## Performance and quality metrics -AWS Customers also wants to use this solution to benchmark the video encoding performance and quality of Amazon EC2 instance families. I analyze performance and video quality metrics thanks to AWS X-Ray service. I define 3 segments : Amazon S3 download, FFmpeg Execution and Amazon S3 upload. +The solution provides comprehensive performance monitoring through AWS X-Ray with three key segments: + +- Amazon S3 download +- FFmpeg Execution +- Amazon S3 upload -If I switch the AWS SSM (Systems Manager) Parameter `/batch-ffmpeg/ffqm` to `TRUE`, quality metrics PSNR, SSIM, VMAF are calculated and exported as an AWS X-RAY metadata and as a JSON file in the Amazon S3 bucket with the key prefix `/metrics/ffqm`. Those metrics are available through AWS Athena views `batch_FFmpeg_ffqm_psnr`, `batch_FFmpeg_ffqm_ssim`, `batch_FFmpeg_ffqm_vmaf`. +Quality metrics (PSNR, SSIM, VMAF) can be enabled by setting the AWS SSM Parameter `/batch-ffmpeg/ffqm` to `TRUE`. Metrics are: -All AWS X-Ray traces are exported to Amazon s3. An Amazon Glue Crawler provides an Amazon Athena table `batch_FFmpeg_xray` and Amazon Athena view `batch_FFmpeg_xray_subsegment`. +- Exported as AWS X-RAY metadata +- Saved as JSON files in the S3 bucket under `/metrics/ffqm` +- Available through AWS Athena views: + - `batch_ffmpeg_ffqm_psnr` + - `batch_ffmpeg_ffqm_ssim` + - `batch_ffmpeg_ffqm_vmaf` + - `batch_ffmpeg_xray_subsegment` -You could then create dashboards with Amazon Quicksight like this one : +Create custom dashboards using Amazon QuickSight: ![Quicksight](doc/metrics_analysis.jpg) ## Cost -AWS Batch optimizes compute costs by paying only for used resources. Using Spot instances leverages unused EC2 capacity for significant savings over On-Demand instances. Benchmark different instance types and sizes to find the optimal workload configuration. Test options like GPU versus CPU to balance performance and cost. +AWS Batch optimizes costs by: + +- Pay-per-use model - only pay for resources when jobs are running +- Spot instance support for up to 90% cost savings +- Automatic instance selection and scaling +- Support for various instance types to optimize price/performance + +## Development + +For development and testing: + +1. Install development dependencies: + +```bash +task setup +source .venv/bin/activate +``` ## Clean up -To prevent unwanted charges after evaluating this solution, delete created resources by: +To avoid unwanted charges: -1. Delete all objects in the Amazon S3 bucket used for testing. I can remove these objects from the S3 console by selecting all objects and clicking "Delete." -2. Destroy the AWS CDK stack that was deployed for testing. To do this, I open a terminal in the Git repository and run: `task cdk:destroy` -3. Verify that all resources have been removed by checking the AWS console. This ensures no resources are accidentally left running, which would lead to unexpected charges. +1. Delete all objects in the S3 bucket used for testing +2. Destroy the AWS CDK stack: ```task cdk:destroy``` +3. Verify all resources have been removed through the AWS console diff --git a/Taskfile.yml b/Taskfile.yml index fdc6c98..303fa9b 100644 --- a/Taskfile.yml +++ b/Taskfile.yml @@ -52,7 +52,7 @@ tasks: - .venv/bin/python3 -m pip install --upgrade --quiet pip - .venv/bin/pip install --quiet -r requirements.txt - .venv/bin/pip install --quiet -r src/requirements.txt - - .venv/bin/pip install --quiet -r tests/requirements.txt + - .venv/bin/pip install --quiet -r tests/requirements.txt || true setup:update: desc: upgrade python packages in python virtual env