aws-emr
Here are 129 public repositories matching this topic...
Bits of code I use during live demos
-
Updated
Jan 23, 2024 - Jupyter Notebook
An AWS based solution using AWS CloudWatch and AWS Lambda based on Python to automatically terminate AWS EMR clusters that have been idle for a specified period of time.
-
Updated
Jun 5, 2024 - Python
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
-
Updated
Jun 13, 2022 - Python
Terraform module to create AWS EMR resources 🇺🇦
-
Updated
Oct 11, 2024 - HCL
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
-
Updated
May 14, 2022 - Python
Cloud-based AI / ML workflow and data application development framework
-
Updated
Aug 20, 2024 - Python
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
-
Updated
Jun 12, 2024 - Python
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
-
Updated
Sep 11, 2020
A collection of airflow sample workflows for data processing on aws
-
Updated
Dec 1, 2017 - Python
Run a Spark job within Amazon EMR
-
Updated
Sep 12, 2020 - Java
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
-
Updated
Oct 10, 2019 - Python
A cookiecutter template for working with PySpark on AWS EMR
-
Updated
Aug 30, 2020 - Python
Improve this page
Add a description, image, and links to the aws-emr topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the aws-emr topic, visit your repo's landing page and select "manage topics."