Airflow & Zeppelin: Better Together
This repository is to show you how to integrate Zeppelin with Airflow 2.
The philosophy behind the ingtegration is to make the transition from development stage to production stage as smooth as possible.
Zeppelin is good at data pipeline development (Spark, Flink, Hive, Python, Shell and etc), while Airflow is the de-facto standard of Job orchestration.
Run this following commands to initialize environment.
- Download spark which is used by Zeppelin
git clone https://github.com/zjffdu/zeppelin_airflow.git
cd zeppelin_airflow
./init.sh
docker-compose up -d
Open http://localhost:8085 for Zeppelin and http://localhost:8080 for Airflow
First, you need to add Zeppelin connection in Airflow, so that ZeppelinOperator
can call rest api of Zeppelin to run notebook.
Here's one screenshot. Host is the Zeppelin server host name (here is the Zeppelin docker container name), port is the Zeppelin server rest api port.
There's one dag zeppelin_example_dag
in Airflow. This dag just run 3 Zeppelin notes:
- Python Tutorial/01. IPython Basics
- Spark Tutorial/02. Spark Basics Features
- Spark Tutorial/03. Spark SQL (PySpark)
You can enable it, then Airflow would run these Zeppelin notes.