-
Notifications
You must be signed in to change notification settings - Fork 0
Create Cluster
Chef-bach can be used to create a hadoop test cluster using virtual machines on an hypervisor host with enough resources. The resulting cluster will be a 4 node cluster with one of the nodes acting as the bootstrap node which will host a chef server.The other three nodes will be hadoop nodes. 2 out of 3 nodes will be master nodes and one node will be the worker node. The following are the steps to go about creating the test cluster. This has been tested on hypervisor hosts running Mac OS and Ubuntu.
- Install
curl
on the hypervisor host - Install
virtualbox
on the hypervisor host - Install
vagrant
on the hypervisor host - Delete the default
DHCP server
inbuilt invirtualbox
.
vboxmanage list dhcpservers
vboxmanage remove ...
- Run
sudo pkill -f VBox
on the hypervisor host - Clone chef-bach repository onto the hypervisor host
git clone https://github.com/bloomberg/chef-bach.git
- rename
chef-bach
tochef-bcpc
directory on the hypervisor host - cd to
chef-bcpc
directory on the hypervisor host - Run the auto installation script under the test directory
./tests/automated_install.sh
- This will download all the required software, creates the four node cluster and installs all the HDP hadoop components. As you can imaging this takes sometime. Depending on the size of the hypervisor host, network bandwidth etc it can take 2 to 3 hrs to complete.
- Once the automated_install.sh is complete logon to the bootstrap node. You need to be in the chef-bcpc directory on the hypervisor
vagrant ssh
- Once logged onto the bootstrap node, cd to
chef-bcpc
directory - Then run the following set of commands twice in sequence
./cluster-assign-roles.sh Test-Laptop hadoop bcpc-vm1
./cluster-assign-roles.sh Test-Laptop hadoop bcpc-vm2
./cluster-assign-roles.sh Test-Laptop hadoop bcpc-vm3
- This completes the creation of the three hadoop nodes
bcpc-vm1
is a master node which hosts HDFS Namenode, HBase master, MySql server and the ip is 10.0.100.11bcpc-vm2
is a master node which hosts YARN resource manager and Hive/Hcatalog, MySql Server and the ip is 10.0.100.12bcpc-vm3
is the worker node which hosts HDFS Datanode, HBase region server, YARN node manager and the ip is 10.0.100.13 - System stats from the nodes and JMX stats from the various hadoop components are available through graphite. The URL to access is
https://10.0.100.5:8888
- Monitoring of hadoop components are done through Zabbix and it can be accessed through the URL
https://10.0.100.5:7777
- Passwords for various components including the password to login to the hadoop nodes can be retrieved by logging on to the bootstrap node and issuing the follwing command
From
chef-bcpc
directory in hypervisor
vagrant ssh
cd chef-bcpc
sudo knife data bag show configs Test-Laptop
This will list the user-id and password of all the components. Node that the cobbler-root-password: is the password to logon to the 3 hadoop nodes as the user "ubuntu" which is part of sudoers list.
Verifying the hadoop test cluster
- Log on to
bcpc-vm3
. You can dossh [email protected]
from the hypervisor or from the bootstrap nodechef-bcpc
directory issue./nodessh.sh Test-Laptop 10.0.100.13 -
- Switch to
hdfs
user - Run
hdfs dfs -copyFromLocal /etc/passwd /passwd
- Run
hdfs dfs -cat /passwd
- Run
hdfs dfs -rm /passwd
- If all these are successful the hdfs component is verified
- Run
hbase shell
- Under the hbase shell, run
create 't1','cf1'
- Run
list
which should display the newly created table as a list - Run
put 't1','r1','cf1:c1','v1'
- Run
scan 't1'
which should display the row create in the previous step - Run
disable 't1'
- Run
drop 't1'
- Run
list
and it should display an empty list - Run
exit
- If all these steps are complete, the HBase component is verified along with ZooKeeper
- As HDFS user, run
hdfs dfs -chmod 777 /user
. Note that since this is a test cluster we are doing it. Do not perform this step in other secured environment - Create a new user in all the three hadoop nodes using
adduser
command - Login to the bcpc-vm2 (10.0.100.12) node and switch to the new user created in the previous step.
- Run
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.11.0-1.jar pi 1 100
- If the previous step completes successfully it verifies YARN and MapReduce components
- If you plan to use Hive, being on bcpc-vm2 hadoop node bring up
- Hive shell by running
hive
- Create a table.
create table t1 (id int)
- Describe the newly created table.
describe t1
- Drop the newly created table.
drop table t1
- If these steps are successful, it verifies the
Hive
component
If the test cluster is created on a hypervisor host located behind a firewall appropriate proxy and DNS servers need to be set in automated_install.sh
script
PROXY=proxy.example.com:80
DNS_SERVERS='"8.8.8.8", "8.8.4.4"'