These instructions were tested on Linux system. It should also work on different systems. If you are aving problems, please let us know by creating an issue.

Prerequisites

docker (version 18.0 or higher)
docker-compose (1.20 or higher)

Usage

1-Swarm manager

To initialize the swarm setup run:

foo@bar:~$ docker swarm init

Swarm initialized: current node (hzywvwt5zygzctmrv4k1hrjfp) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-3hlnrriihgjm4ajgmk8drpe5my7kzprtjmgh2qrh8akw64jy98-6vrrb008zo22sk76lr4c7q2qb 10.14.0.164:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

Your system will be the Swarm manager node. Take a note of the token generated by this command. You will use this token if you want to add other nodes to the Swarm cluster.

2-Create an overlay and cluster networks

foo@bar:~$ docker network create --ingress --driver overlay ingress
foo@bar:~$ docker network create -d overlay --attachable spark-net

If you get an warning about the existing ingress network, just ignore it.

3-Pull the Docker images

Sometimes the Docker images are not pulled automatically. It is better that you download them before deploying the services.

foo@bar:~$ docker-compose pull

You can also pull the images manually:

foo@bar:~$ docker pull fdiblen/spark-master-dirac && docker pull fdiblen/spark-worker-dirac && docker pull fdiblen/hadoop

4-Create services

To deploy Spark and Hadoop services run:

foo@bar:~$ docker stack deploy --resolve-image always -c docker-compose.yml spark

Depending on your system or Docker version you may get an error. In this case try the following command:

foo@bar:~$ docker stack deploy -c docker-compose.yml spark

5-Web interfaces

You can now access to the web interface of the created services:

Spark interface http://0.0.0.0:8080/

Hadoop interface http://0.0.0.0:8088

Hadoop datanodes http://0.0.0.0:50070

Scaling the services

Number of available Spark workers can easily be scaled up/down. In order to have 4 Spark worker nodes run:

foo@bar:~$ docker service scale spark_worker=4

Info about the services

List the services:

foo@bar:~$ docker stack services spark

List the tasks:

foo@bar:~$ docker stack ps spark

foo@bar:~$ docker network inspect docker_gwbridge | egrep 'Name|IPv4>

docker stack services spark docker service inspect --pretty spark_master docker service inspect --pretty spark_worker docker service ps spark_master docker service ps spark_worker docker service ps spark_hadoop

Stopping the services

foo@bar:~$ docker stack rm spark

Clean up containers

docker stop $(docker ps -a -q) && docker rm $(docker ps -a -q)

Debugging

You can check the logs

foo@bar:~$ docker logs -f container_name

The container name can be found by

foo@bar:~$ docker stack ps spark

HDFS

connect

docker exec -ti $(docker ps -a | grep hadoop | awk '{print $1}') /bin/bash

check configuration

export HADOOP_CLASSPATH=$(/opt/soft/hadoop/bin/hadoop classpath)
export PATH=$PATH:/opt/soft/hadoop/bin

hdfs dfsadmin -report
hdfs dfsadmin -printTopology

hdfs getconf -confKey fs.defaultFS
hdfs getconf -namenodes
hdfs  dfsadmin -getDatanodeInfo hdfs://localhost:50020/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTALL.md

INSTALL.md

Prerequisites

Usage

1-Swarm manager

2-Create an overlay and cluster networks

3-Pull the Docker images

4-Create services

5-Web interfaces

Scaling the services

Info about the services

Stopping the services

Clean up containers

Debugging

HDFS

connect

check configuration

Files

INSTALL.md

Latest commit

History

INSTALL.md

File metadata and controls

Prerequisites

Usage

1-Swarm manager

2-Create an overlay and cluster networks

3-Pull the Docker images

4-Create services

5-Web interfaces

Scaling the services

Info about the services

Stopping the services

Clean up containers

Debugging

HDFS

connect

check configuration