Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun error data and the IP address occurrence.
-
Updated
Jul 3, 2024 - Python
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun error data and the IP address occurrence.
Apache Hive Standalone Metastore
Service for automatically managing and cleaning up unreferenced data
Collection of OKDP helm charts
Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
This is a repository for custom user defined functions used in Apache Hive
Sample code with integration between Data Catalog and Hive data source.
"hms-mirror" is a utility used to bridge the gap between two clusters and migrate hive metadata.
Apache Hive Metastore in Standalone Mode With Docker
Foundation Workspace for Airflow, Spark, Hive, and Azure Data Lake Gen2 via Docker
A client for connecting and running DDLs on hive metastore.
Kubernetes Hive Minio connection example
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
Apache Hive Metastore as a Standalone server in Docker
A Python Client for Hive Metastore
A Docker Compose template that builds a interactive development environment for PySpark with Jupyter Lab, MinIO as object storage, Hive Metastore, Trino and Kafka
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
A service which allows Hive Metastore Listeners to be deployed outside of the Hive Metastore Service
Big Data Pipeline | Querying Data from Hive Table Phase
Add a description, image, and links to the hive-metastore topic page so that developers can more easily learn about it.
To associate your repository with the hive-metastore topic, visit your repo's landing page and select "manage topics."