Skip to content
This repository has been archived by the owner on May 13, 2022. It is now read-only.

Lambda function to capture metadata across the pipeline and ingests into Elasticsearch. This also includes the lambda code for generating the CDO dashboards in Elasticsearch service

Notifications You must be signed in to change notification settings

USDOT-SDC-Archive/sdc-dot-metadata-ingest

Repository files navigation

Build Status Quality Gate Status Coverage

sdc-dot-metadata-ingest

The Secure Data Commons (SDC) is a cloud-based analytics platform that enables access to traffic engineers, researchers, and data scientists to various transportation related datasets. The SDC platform is a prototype created as part of the U.S. Department of Transportation (USDOT) research project. The objective of this prototype is to provide a secure platform, which will enable USDOT and the broader transportation sector to share and collaborate their research, tools, algorithms, analysis, and more around sensitive datasets using modern, commercially available tools without the need to install tools or software locally. Secure Data Commons (SDC) enables collaborative but controlled integration and analysis of research data at the moderate sensitivity level (PII & CBI).

Table of Contents

I. Release Notes

II. Usage Example

III. Configuration

IV. Installation

V. Design and Architecture

VI. Unit Tests

VII. File Manifest

VIII. Development Setup

IX. Release History

X. Contact Information

XI. Contributing

XII. Known Bugs

XIII. Credits and Acknowledgment

XIV. CODE.GOV Registration Info

The following instructions describe the procedure to build and deploy the lambda.

Build and Deploy the Lambda

Environment Variables

Below are the environment variable needed :-

CURATED_BUCKET_NAME - {name_of_the_curate_bucket}

CV_SUBMISSIONS_COUNTS_METRIC - {metrics_name_of_cv_submission}

ELASTICSEARCH_ENDPOINT - {url_of_elatic_search}

ENVIRONMENT_NAME -{dev/preprod/prod}

PUBLISHED_BUCKET_NAME -{name_of_the_published_bucket}

SUBMISSIONS_BUCKET_NAME - {name_of_the_raw_submission_bucket}

WAZE_CURATED_COUNTS_METRIC - {metrics_name_of_curated}

WAZE_SUBMISSIONS_COUNT_METRIC -{metrics_name_of_raw_submission}

WAZE_ZERO_BYTE_SUBMISSIONS_COUNT_METRIC - {metrics_name_of_zero_byte}

This lamda function is triggered by aws-s3-notification whenever an object is put into raw submission bucket or curated bucket.The primary function of this lambda is given below.

1. It creates metadata of the new object and push metadata to elastic search.

2. It also push custom metrics for raw submission count,zero byte and curated count to cloud watch metrics.

3. It also creates visualization metrics in kibana.

4. In case of any failures/errors it push messages in DLQ so that this can be processed later.

sdc-dot-metadata-ingest

Prerequisites

*Python 3.6

Build Process

Step 1: Setup virtual environment on your system by following below link:

https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example-deployment-pkg.html#with-s3-example-deployment-pkg-python

Step 2: Crete a script file with below contents for e.g(sdc-dot-waze-data-ingest.sh)


cd {path_to_your_repository}/sdc-dot-metadata-ingest
zipFileName="{path_to_your_repository}/sdc-dot-metadata-ingest.zip"

echo "Zip file name is = ${zipFileName}"

zip -9 $zipFileName lambdas/*
zip -r9 $zipFileName common/*
zip -r9 $zipFileName dashboard_registry_handler_main.py.py
zip -r9 $zipFileName bucket_event_handler_main.py

cd {path_to_your_virtual_env}/python3.6/site-packages/
zip -r9 $zipFileName chardet certifi idna

Step 3: Change the permission of the script file

chmod u+x sdc-dot-waze-data-ingest.sh

Step 4 Run the script file ./sdc-dot-metadata-ingest.sh

Step 5: Upload the sdc-dot-metadata-ingest.zip generated from Step 4 to a lambda function via aws console.

For any queries you can reach to [email protected]

Thank you to the Department of Transportation for funding to develop this project.

Agency: DOT

Short Description: This is a lambda function developed by SDC Team for generating the metadata from an s3 key and indexing into Elasticsearch Service.

Status: Beta

Tags: transportation, connected vehicles, intelligent transportation systems

Labor Hours:

Contact Name: [email protected]

About

Lambda function to capture metadata across the pipeline and ingests into Elasticsearch. This also includes the lambda code for generating the CDO dashboards in Elasticsearch service

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages