GitHub - temirovazat/kafka-to-clickhouse: ♻ Data Synchronization from Kafka to Clickhouse 📊

Kafka to Clickhouse

Description

The aim of this project is to implement an ETL system for analysts that stores data about movie views. Since the service needs to handle the constant influx of information from each user, it uses the event streaming platform Kafka. To provide an API layer that sends events to Kafka without any transformations underneath, it leverages the FastAPI framework. The ETL process for loading data into the analytical data store is implemented using the batch and stream data processing library PySpark. The storage must handle very large data and do so within a reasonable time frame for analysts to conduct their research. Therefore, the project involved research to choose the right storage solution, and the best choice was the analytical OLAP system ClickHouse.

Technologies

Python Kafka FastAPI PySpark Clickhouse Vertica Jupyter Notebook Docker

How to Run the Project:

Clone the repository and navigate to the infra directory:

git clone https://github.com/temirovazat/kafka-to-clickhouse.git

cd kafka-to-clickhouse/infra/

Create a .env file and add project settings:

nano .env

# Kafka
KAFKA_HOST=kafka
KAFKA_PORT=9092

# Clickhouse
CLICKHOUSE_HOST=clickhouse-node1
CLICKHOUSE_PORT=9000

Deploy and run the project in containers:

docker-compose up

Send a POST request with the current movie view frame:

http://127.0.0.1/films/<UUID>/video_progress

{
    "frame": <INTEGER>
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
backend		backend
benchmark		benchmark
infra		infra
.gitignore		.gitignore
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka to Clickhouse

Description

Technologies

How to Run the Project:

About

Releases

Packages

Languages

temirovazat/kafka-to-clickhouse

Folders and files

Latest commit

History

Repository files navigation

Kafka to Clickhouse

Description

Technologies

How to Run the Project:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages