Cluster Monitoring

The BCPC cluster monitoring uses the following components

Graphite to store cluster statistics and graphing the statistics
Zabbix for triggering on monitored events
Diamond to collect server level statistics and store into graphite
jmxtrans to collect JMX statistics from the hadoop java processes

When a new node is added to the cluster, the server level statistics can be collected by including bcpc::Diamond recipe to the runlist of the node.

The following details the steps involved to enable collection of JMX data from a new Java process added to the BCPC hadoop cluster.

Add the JMX port of the new Java process to the BCPC Hadoop cookbook default attribute file using the following convention.

default["bcpc"]["hadoop"]["�process"]["jmx"]["port"�]

where "process" uniquely identifies the new process added to the cluster.

Add a new query to the bcpc_jmxtrans cookbook default attribute file. This will be used to generate the JSON file required for JMXtrans to retrieve data from the Java process JMX mbeans and send it to Graphite for storage.

{
     'obj' => "",
     'result_alias' => "",
     'attr' => [ "�attr1", ...
                    ]
  },...]

Add the process service related details to the chef role to which the process corresponds to.

"jmxtrans":  {
      "servers":  [
                 {
                    "type": 'process',
                    "service": '�process-service-name'
                    "service_cmd": '�string to identify service PID'�
                 }, 
        ]
      }

If the role already includes other process (servers) details, add the new java process details to the servers array. The key value pairs are, type key stores the process string which uniquely identifies the new Java process. The service key stores the string which is used to start, stop the new java process using the service command. THe service_cmd key stores the string which can be used to identify the PID of the process when it is running on the cluster node. As you may have guessed, the type will be used to identify the jmx port of the process, the service value will be used to issue service command and service_cmd will be used to identify the process PID and start time.

Once these changes are in place along with the new java process installed on the cluster, JMXTrans will start collecting jmx statics and send it to the Graphite database installed on the cluster.

If actions need to be taken based on the JMX stats this is accomplished by populating the stat into Zabbix so that triggers can be generated based on predefined conditions. Since data is stored in Graphite, Zabbix agents are not used to collect data. Instead data of interest is send from Graphite into Zabbix. Inorder to move data from Graphite to Zabbix and create the required trigger conditions, the following need to be added to the bcpc-hadoop cookbook default attribute file

default["bcpc"]["hadoop"]["graphite"]["queries"] = {
  'process' => [               
    {
      'type'  => "jmx",
      'query' => "memory.NonHeapMemoryUsage_committed",
      'key'   => "hbasenonheapmem",
      'trigger_val' => "max(61,0)",
      'trigger_cond' => "=0",
      'trigger_name' => "HBaseMasterAvailability'�,
      'trigger_enable' => 0,
      'trigger_dep' => [] 
      'history_days' => 2,
      'trend_days' => 30
    },
],
 }

process is the string which uniquely identifies the java process. This will be used to create the host in Zabbix.

type value should be set to jmx since that is the only type currently supported.

query the Graphite query which need to be executed to retrieve data from its database.

key value should be the string which need to be used to create the Zabbix item.

trigger_val value will be used to identify the data to be used to generate a trigger in Zabbix. In the example the maximum value in the past 61 seconds will be used by Zabbix to check whether a trigger need to generated.

trigger_cond value is the condition which need to be satisfied to generate a Zabbix trigger.

trigger_name value will be used to create the trigger item in Zabbix.

trigger_dep value is an array of trigger names on which this trigger depends on.

history_days value deteremines the number of days the data for this trigger item will be stored in Zabbix before it gets purged.

trend_days value determines the number of days the trending data will be stored in Zabbix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Monitoring

Clone this wiki locally