Skip to content

Latest commit

 

History

History
136 lines (95 loc) · 3.47 KB

INSTALL.md

File metadata and controls

136 lines (95 loc) · 3.47 KB

These instructions were tested on Linux system. It should also work on different systems. If you are aving problems, please let us know by creating an issue.

Prerequisites

  • docker (version 18.0 or higher)
  • docker-compose (1.20 or higher)

Usage

1-Swarm manager

To initialize the swarm setup run:

foo@bar:~$ docker swarm init

Swarm initialized: current node (hzywvwt5zygzctmrv4k1hrjfp) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-3hlnrriihgjm4ajgmk8drpe5my7kzprtjmgh2qrh8akw64jy98-6vrrb008zo22sk76lr4c7q2qb 10.14.0.164:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

Your system will be the Swarm manager node. Take a note of the token generated by this command. You will use this token if you want to add other nodes to the Swarm cluster.

2-Create an overlay and cluster networks

foo@bar:~$ docker network create --ingress --driver overlay ingress
foo@bar:~$ docker network create -d overlay --attachable spark-net

If you get an warning about the existing ingress network, just ignore it.

3-Pull the Docker images

Sometimes the Docker images are not pulled automatically. It is better that you download them before deploying the services.

foo@bar:~$ docker-compose pull 

You can also pull the images manually:

foo@bar:~$ docker pull fdiblen/spark-master-dirac && docker pull fdiblen/spark-worker-dirac && docker pull fdiblen/hadoop  

4-Create services

To deploy Spark and Hadoop services run:

foo@bar:~$ docker stack deploy --resolve-image always -c docker-compose.yml spark

Depending on your system or Docker version you may get an error. In this case try the following command:

foo@bar:~$ docker stack deploy -c docker-compose.yml spark

5-Web interfaces

You can now access to the web interface of the created services:

Spark interface http://0.0.0.0:8080/

Hadoop interface http://0.0.0.0:8088

Hadoop datanodes http://0.0.0.0:50070

Scaling the services

Number of available Spark workers can easily be scaled up/down. In order to have 4 Spark worker nodes run:

foo@bar:~$ docker service scale spark_worker=4

Info about the services

List the services:

foo@bar:~$ docker stack services spark

List the tasks:

foo@bar:~$ docker stack ps spark
foo@bar:~$ docker network inspect docker_gwbridge | egrep 'Name|IPv4>

docker stack services spark docker service inspect --pretty spark_master docker service inspect --pretty spark_worker docker service ps spark_master docker service ps spark_worker docker service ps spark_hadoop

Stopping the services

foo@bar:~$ docker stack rm spark

Clean up containers

docker stop $(docker ps -a -q) && docker rm $(docker ps -a -q)

Debugging

You can check the logs

foo@bar:~$ docker logs -f container_name

The container name can be found by

foo@bar:~$ docker stack ps spark

HDFS

connect

docker exec -ti $(docker ps -a | grep hadoop | awk '{print $1}') /bin/bash

check configuration

export HADOOP_CLASSPATH=$(/opt/soft/hadoop/bin/hadoop classpath)
export PATH=$PATH:/opt/soft/hadoop/bin

hdfs dfsadmin -report
hdfs dfsadmin -printTopology

hdfs getconf -confKey fs.defaultFS
hdfs getconf -namenodes
hdfs  dfsadmin -getDatanodeInfo hdfs://localhost:50020/