These instructions were tested on Linux system. It should also work on different systems. If you are aving problems, please let us know by creating an issue.
- docker (version 18.0 or higher)
- docker-compose (1.20 or higher)
To initialize the swarm setup run:
foo@bar:~$ docker swarm init
Swarm initialized: current node (hzywvwt5zygzctmrv4k1hrjfp) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-3hlnrriihgjm4ajgmk8drpe5my7kzprtjmgh2qrh8akw64jy98-6vrrb008zo22sk76lr4c7q2qb 10.14.0.164:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
Your system will be the Swarm manager node. Take a note of the token generated by this command. You will use this token if you want to add other nodes to the Swarm cluster.
foo@bar:~$ docker network create --ingress --driver overlay ingress
foo@bar:~$ docker network create -d overlay --attachable spark-net
If you get an warning about the existing ingress network, just ignore it.
Sometimes the Docker images are not pulled automatically. It is better that you download them before deploying the services.
foo@bar:~$ docker-compose pull
You can also pull the images manually:
foo@bar:~$ docker pull fdiblen/spark-master-dirac && docker pull fdiblen/spark-worker-dirac && docker pull fdiblen/hadoop
To deploy Spark and Hadoop services run:
foo@bar:~$ docker stack deploy --resolve-image always -c docker-compose.yml spark
Depending on your system or Docker version you may get an error. In this case try the following command:
foo@bar:~$ docker stack deploy -c docker-compose.yml spark
You can now access to the web interface of the created services:
Spark interface http://0.0.0.0:8080/
Hadoop interface http://0.0.0.0:8088
Hadoop datanodes http://0.0.0.0:50070
Number of available Spark workers can easily be scaled up/down. In order to have 4 Spark worker nodes run:
foo@bar:~$ docker service scale spark_worker=4
List the services:
foo@bar:~$ docker stack services spark
List the tasks:
foo@bar:~$ docker stack ps spark
foo@bar:~$ docker network inspect docker_gwbridge | egrep 'Name|IPv4>
docker stack services spark docker service inspect --pretty spark_master docker service inspect --pretty spark_worker docker service ps spark_master docker service ps spark_worker docker service ps spark_hadoop
foo@bar:~$ docker stack rm spark
docker stop
You can check the logs
foo@bar:~$ docker logs -f container_name
The container name can be found by
foo@bar:~$ docker stack ps spark
docker exec -ti $(docker ps -a | grep hadoop | awk '{print $1}') /bin/bash
export HADOOP_CLASSPATH=$(/opt/soft/hadoop/bin/hadoop classpath)
export PATH=$PATH:/opt/soft/hadoop/bin
hdfs dfsadmin -report
hdfs dfsadmin -printTopology
hdfs getconf -confKey fs.defaultFS
hdfs getconf -namenodes
hdfs dfsadmin -getDatanodeInfo hdfs://localhost:50020/