Sample application that reads events from Amazon Kinesis Data Streams and batches records to Amazon Timestream, visualizing results via Grafana.
The overall serverless architecture will work on events streamed to Amazon Kinesis Data Streams, pulled by an Apache Flink analytics application hosted on Amazon Kinesis Data Analytics to be batch-inserted in Amazon Timestream database. Finally, data will be visualized directly from the Timestream database by a Grafana dashboard using the Timestream datasource plugin.
The sample setup will assume that events are being streamed via Amazon Kinesis service. However, this is not a precondition as any other streaming service like Kafka provisioned on EC2 or using Amazon Managed Streaming for Apache Kafka(Amazon MSK) can be used in a similar setup. To simulate devices streaming data to the Kinesis data stream, an AWS Lambda function produces events and pushes them to Kinesis data stream, the events streamed will contain information similar to the below sample.
DeviceID | Timestamp | temperature | humidity | voltage | watt |
---|---|---|---|---|---|
b974f43a-2f04-11eb-adc1-0242ac120002 | 2020-09-13 11:49:42.5352820 | 15.5 | 70.3 | 39.7 | 301.44 |
In Timestream data will be modeled as follows
DeviceID (Dimension) | measure_value::double | measure_name | measure_type | time |
---|---|---|---|---|
b974f43a-2f04-11eb-adc1-0242ac120002 | 15.5 | temperature | DOUBLE | 2020-09-13 11:49:42.5352820 |
b974f43a-2f04-11eb-adc1-0242ac120002 | 70.3 | humidity | DOUBLE | 2020-09-13 11:49:42.5352820 |
b974f43a-2f04-11eb-adc1-0242ac120002 | 39.7 | voltage | DOUBLE | 2020-09-13 11:49:42.5352820 |
b974f43a-2f04-11eb-adc1-0242ac120002 | 301.4 | watt | DOUBLE | 2020-09-13 11:49:42.5352820 |
The Device ID is mapped as a
Dimension
and the Property fields measured mapped as
measure_name
, finally the value of the measure is mapped to the
measure_value
.datatype
is set as double in this case.
Check the best practices on mapping
and data modeling for a better insight on mapping your data.
This project provides 2 sample applications built with different toolsets. You can use either one of those as the application to be deployed.
- To build an application using Java and Apache Maven, refer to instructions here
- To build an application using Kotlin and Gradle, refer to instructions here
Infrastructure deployment will automatically use packaged application jar and upload it to an
Amazon S3 bucket. The infrastructure utilizes multiple stacks built using an AWS CDK
project with Python3 language. For more information on working with the CDK and Python,
check the following guide. To deploy all
stacks use the --all
option when invoking cdk deploy
- Navigate to cdk folder
- Follow instructions here to create virtual environment and build stacks
- Make sure CDK environment is bootstrapped in the account and region you're deploying stacks to, as the stacks utilize
assets to deploy the Kinesis Data Analytics Flink application
$ cdk bootstrap
- Deploy infrastructure using packaged applications
- To deploy infrastructure and use
Java
application as basis for Kinesis analytics application, you can directly deploy the CDK stacks$ cdk deploy --all
- To deploy infrastructure and use
Kotlin
application as basis for Kinesis analytics application, you can customize the stacks using context variables$ cdk deploy --context kda_path=../analytics-kotlin/build/libs/analytics-timestream-kotlin-sample-all.jar --all
- To deploy infrastructure and use
Once CDK stacks are deployed successfully you can check created AWS resources. You can directly run
script ./setup.sh
or follow below instructions.
-
Amazon Kinesis Data Stream
Deployed through stack
amazon-kinesis-stream-stack
and is ready to receive events from sample producer.Producer resources are deployed through stack
sample-kinesis-stream-producer
. You can check the Lambda function monitoring and logs to make sure it's being regularly called and sending events to the stream.For more information on how the producer works check documentation.
-
Amazon Kinesis Data Analytics for Apache Flink Application
Deployed through stack
amazon-kinesis-analytics
. Although the application is created through the stack it would still not be running.To run the application and kick-off the pipeline, simply pickup the application name from the stack output
KdaApplicationName
.Follow instructions under Run the Application section or run the following command
$ aws kinesisanalyticsv2 start-application --application-name amazon-kinesis-analytics \ --run-configuration '{ "ApplicationRestoreConfiguration": { "ApplicationRestoreType": "SKIP_RESTORE_FROM_SNAPSHOT" } }'
-
Amazon S3 bucket to store the JAR package for the application and application role
Deployed through stack
flink-source-bucket
-
Amazon Timestream Database
Deployed through stack
amazon-timestream
-
Grafana dashboard deployment
Deployed through stack
grafana
. To check the created Grafana check the outputMyFargateServiceServiceURL...
.- Grafana deployed using Amazon ECS on AWS Fargate for compute
- Amazon Elastic File System (Amazon EFS) for storage
- AWS SecretsManager Secret storing Grafana
admin
user password. Check the stack outputGrafanaAdminSecret
for AWS Secrets Manager Secret's id storing the Grafana admin user's password.
To help you get started with data visualization, we have created a sample dashboard in Grafana that visualizes data sent
to Timestream from the sample producer. If you invoked the setup.sh
script, it would have automatically
performed these steps.
You can also check the following video tutorial or complete guide for more information.
-
Install Grafana Datasource plugin
The Grafana Amazon Timestream datasource plugin is automatically installed through the infrastructure stack.
-
Create Grafana Datasource
To create a Timeseries data source, go to Datasources, click on Add Datasource, search for Timestream, and select the Timestream datasource. You can also use programmatic means described here.
To use the programmatic means to create a data source:
- Create an API token to be used. Make sure to replace
<stack grafana.GrafanaAdminSecret.secret_value>
with the sceret value, escaping any characters in the password like$
or"
with a preceding\
. And replace<stack grafana.MyFargateServiceServiceURL...>
with the value from stack output.
$ grafana_token=$(\ curl -X POST -u "admin:<stack grafana.GrafanaAdminSecret.secret_value>" \ -H "Content-Type: application/json" \ -d '{"name":"apikeycurl", "role": "Admin"}' \ <stack grafana.MyFargateServiceServiceURL...>/api/auth/keys \ | jq -r .key)
- Create Amazon Timestream datasource
$ curl -X POST --insecure \ -H "Content-Type: application/json" -H "Authorization: Bearer ${grafana_token}" \ -d @./cdk/stacks/grafana/datasource.json \ <stack grafana.MyFargateServiceServiceURL...>/api/datasources
- Create an API token to be used. Make sure to replace
-
Create Grafana Dashboard
Grafana dashboards can be exported and imported as
JSON
. You can find a dashboard sample json in here. The defined dashboard provides sample variables and visualizations by querying directly from Timestream database. The dashboard assumes the Timestream datasource is your default datasource.To import sample dashboard you can follow instructions here or use programmatic means described here.
To use the programmatic means to create a dashboard:
$ curl -X POST --insecure \ -H "Content-Type: application/json" -H "Authorization: Bearer ${grafana_token}" \ -d @./cdk/stacks/grafana/dashboard.json \ <stack grafana.MyFargateServiceServiceURL...>/api/dashboards/db
To delete all created stack resources you can use
$ cdk destroy --all
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.