platform-ds

Datascience environment managed by Docker and Docker-compose. This platform can be used for testing and exploration by any datascience team. Easy to deploy on a linux serveur.

Prerequisites

Launch the platform

$ git clone <repo_url>
$ cd plateforme-ds
$ make

Then, if you want to start a spark-cluster

$ docker-compose -f spark-local.yml up -d

Or a spark on local jupyter container

$ docker-compose -f spark-cluster.yml up -d

You can access namenode container by running the following command:

$ docker exec -it namenode bash

You can access jupyter container to obtain the token key by running the following command:

$ docker exec -it jupyter bash
$ jupyter notebook list

Spark and Hadoop in jupyter

If you launch the spark cluster, you can connect to it in the jupyter notebook by running the following code:

from pyspark import SparkConf, SparkContext

conf = SparkConf().setAppName('test').setMaster('spark://spark-master:7077')
sc = SparkContext(conf=conf)

And read files from HDFS system as follow:

lines = sc.textFile("hdfs://namenode:9000/<your_path_to_the_file>")

Connect to the platform

go to the url http://<ip_or_hostname_server>:10000 to open a jupyterlab session
Hadoop nanemode: http://<ip_or_hostname_server>:9870
Hadoop datanode: http://<ip_or_hostname_server>:9864
Ressource Manager: http://<ip_or_hostname_server>:8088

Spark cluster

Spark master: http://<ip_or_hostname_server>:8585 (webui) or http://<ip_or_hostname_server>:7077 (jobs)
Spark worker-[x]: http://<ip_or_hostname_server>:808[x]

Spark local

Spark webui: http://<ip_or_hostname_server>:4040

TODO LIST

Add linked folder between jupyter container and host machine (handle permission issues)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
datanode		datanode
hadoop-base		hadoop-base
jupyter		jupyter
namenode		namenode
spark-base		spark-base
spark-master		spark-master
spark-worker		spark-worker
.env		.env
Makefile		Makefile
README.md		README.md
spark-cluster.yml		spark-cluster.yml
spark-local.yml		spark-local.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

platform-ds

Prerequisites

Launch the platform

Spark and Hadoop in jupyter

Connect to the platform

TODO LIST

About

Releases

Packages

Languages

aksl20/platform-ds

Folders and files

Latest commit

History

Repository files navigation

platform-ds

Prerequisites

Launch the platform

Spark and Hadoop in jupyter

Connect to the platform

TODO LIST

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages