Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
Updated
Jul 6, 2024 - Python
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Apache Superset is a Data Visualization and Data Exploration Platform
Learn how to design, develop, deploy and iterate on production-grade ML applications.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Workflow Engine for Kubernetes
The Data Engineering Cookbook
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Always know what to expect from your data.
An orchestration platform for the development, production, and observation of data assets.
Roadmap to becoming a data engineer in 2021
The Open Source Feature Store for Machine Learning
Fancy stream processing made operationally mundane
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Turns Data and AI algorithms into production-ready web applications in no time.
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Implementing best practices for PySpark ETL jobs and applications.
A collection of scientific methods, processes, algorithms, and systems to build stories & models.
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."