This blueprint deploys a KDA app that reads from MSK Serverless using IAM auth and writes to S3 using the Python Table API:
- Flink version:
1.15.2
- Python version:
3.8
- New (in Flink 1.13)
KafkaSource
connector (FlinkKafkaSource
is slated to be deprecated). FileSink
(StreamingFileSink
is slated to be deprecated).
- Package and deploy Python to S3
- Deploy associated infra (MSK and KDA) using CDK script
- If using existing resources, you can simply update app properties in KDA.
- Perform data generation
- Maven
- AWS SDK v2
- AWS CDK v2 - for deploying associated infra (MSK and KDA app)
- First, let's set up some environment variables to make the deployment easier. Replace these values with your own S3 bucket, app name, etc.
export AWS_PROFILE=<<profile-name>>
export APP_NAME=<<name-of-your-app>>
export S3_BUCKET=<<your-s3-bucket-name>>
export S3_FILE_KEY=<<your-jar-name-on-s3>>
- Package Python app folder into zip package
Assuming you want to name your zip file kdaApp.zip
, run:
zip kdaApp.zip * lib/*
For more information on packaging PyFlink applications, please follow the instructions detailed here. And for additional information on developing PyFlink applications locally and then deploying to Kinesis Data Analytics, please see Getting Started with PyFlink.
- Copy zip package to S3 so it can be referenced in CDK deployment
aws s3 cp target/<<your app zip>> ${S3_BUCKET}/{S3_FILE_KEY}
-
Follow instructions in the
cdk-infra
folder to deploy the infrastructure associated with this app - such as MSK Serverless and the Kinesis Data Analytics application. -
Follow instructions in orders-datagen to create topic and generate data into MSK.
-
Start your Kinesis Data Analytics application from the AWS console.
-
Do a Flink query or S3 Select Query against S3 to view data written to S3.