Skip to content
This repository has been archived by the owner on Oct 3, 2022. It is now read-only.

Try out Spark 1.5 out using VMs provisioned by Vagrant and Ansible

Notifications You must be signed in to change notification settings

theomega/spark_vagrant_ansible

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark_vagrant_ansible

Try out Spark 1.5 out using VMs provisioned by Vagrant and Ansible

What is this

This reopsitory provides a Vagrant file which installs in combination with an Ansible playbook:

  • Ubuntu 14.04
  • Spark 1.5 (and its dependencies)

How to use

  1. Install the following dependendencies on your machine:
  • Vagrant
  • ansible
  • vagrant-ansible
  1. Clone the repository
  2. Launch the machines:
  • vagrant up spark-master
  • vagrant up spark-slave1
  • vagrant up spark-slave2
  1. Check the Spark Webinterface at http://192.168.33.10:8080/
  1. To launch a spark job, use the /opt/spark/bin/spark-submit-local script on the spark-master vm. Connect to this machine using vagrant ssh spark-master

Limitations:

  • Spark runs in stand alone more, this means that there is no underlying hadoop or HDFS. If you need to work on files, you need to have them shared on all machines. The /data folder is shared between all machines and also the host. You can put files there and use them.
  • As all VMs run on your computer, memory is any issue. The ansible playbook configures spark quite memory constrainted. You can change these limits by first giving the VMs more memory (in the Vagrantfile) and then changing the launchers in the rules/spark-master and rules/spark-slave folder

About

Try out Spark 1.5 out using VMs provisioned by Vagrant and Ansible

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published