Dockerised HDFS with YARN Resource Manager

Purpose of this project is to set up a minimum Hadoop (3.x.x) cluster to test submission of spark jobs via YARN

Architecture

2 Containers named master and worker

Master has the following responsibilities: HDFS

Name Node
Data Node YARN
Resource Manager
Node Manager
Timeline History Server Spark
History Server Map Reduce
Map Reduce History Server

Worker has the following responsibilities: HDFS

Data Node YARN
Node Manager

In this architecture, when jobs are submitted in cluster mode, the driver can be located on either master or worker.

Smoke Test

Starting up the Hadoop cluster

Run docker-compose up -d to start up the cluster locally.

Relevant UIs are up

Assuming the docker-compose file are launched locally, the following URLs should be accessible:

Namenode UI at localhost:9870
Resource Manager UI at localhost:8088
Spark History Server at localhost:18080
YARN Timeline History Server at localhost:19888

Running `spark-submit` locally

Download and setup the corresponding version of spark with hadoop on your local machine.
Set environment variable HADOOP_CONF_DIR to /path/to/local-hadoop-config where local-hadoop-config is the directory at the root of this repository. This ensures that any hdfs or spark-submit command will run with the options found in the relevant .xml files.
Ensure the following entries are set in the host file:

127.0.0.1 host.docker.internal
127.0.0.1 master
127.0.0.1 worker

Run hdfs dfs -ls / to confirm correct setup of hadoop client.
Run the following command to confirm correct setup of spark client.

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --executor-cores 1 --conf "spark.eventLog.enabled=true" --conf "spark.eventLog.dir=hdfs:///spark-logs" ${SPARK_HOME}/examples/jars/spark-examples*.jar 10

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
hadoop		hadoop
local-hadoop-config		local-hadoop-config
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dockerised HDFS with YARN Resource Manager

Architecture

Smoke Test

Starting up the Hadoop cluster

Relevant UIs are up

Running `spark-submit` locally

About

Releases

Packages

Languages

fruitjeus/docker-hadoop-spark

Folders and files

Latest commit

History

Repository files navigation

Dockerised HDFS with YARN Resource Manager

Architecture

Smoke Test

Starting up the Hadoop cluster

Relevant UIs are up

Running spark-submit locally

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Running `spark-submit` locally

Packages