Giter8 template for generating a spark job project seed in Scala.
This project is intended for people who know how to use Apache spark and want to get started right away.
You should only need to clone this project if you are modifying the giter8 template. For information on giter8 templates, please see http://www.foundweekends.org/giter8/
If you want to create a project:
g8 https://github.com/s3ni0r/spark-job-skeleton.g8
If you are testing this giter8 template locally, you should install g8 and then run the local test feature:
g8 file://spark-job-skeleton.g8/ --name=my-spark-job --force
Will create an example template called my-spark-job
, for example.
Create a local but iso to production environnement to be as autonomous as possible while working on spark projects.
this project contain all needed configuration files to create :
- Dockerized environnement
- Local but a real distributed environnement
- 1 Namenode
- 1 Datanode (to increase as you wish)
- Yarn resource manager
- 3 Yarn node managers
- Yarn hitory server
- Spark history
- Spark shell
- Line up with exact Hadoop components version on production
- Deployment to dockerized cluster via sbt command line
- Mount data to hdfs via docker volumes from withing project folder
- Access spark history webui for inspection :)
- Access Yarn logs for debugging :)
- Access to Spark shell for fiddling :)