-
Notifications
You must be signed in to change notification settings - Fork 0
Cluster Monitoring
The BCPC cluster monitoring uses the following components
- Graphite to store cluster statistics and graphing the statistics
- Zabbix for triggering on monitored events
- Diamond to collect server level statistics and store into graphite
- jmxtrans to collect JMX statistics from the hadoop java processes
When a new node is added to the cluster, the server level statistics can be collected by including bcpc::Diamond
recipe to the runlist of the node.
The following details the steps involved to enable collection of JMX data from a new Java process added to the BCPC hadoop cluster.
- Add the JMX port of the new Java process to the BCPC Hadoop cookbook default attribute file using the following convention.
default["bcpc"]["hadoop"]["�process"]["jmx"]["port"�]
where "process" uniquely identifies the new process added to the cluster.
- Add a new query to the
bcpc_jmxtrans
cookbook default attribute file. This will be used to generate the JSON file required for JMXtrans to retrieve data from the Java process JMX mbeans and send it to Graphite for storage.
{
'obj' => "",
'result_alias' => "",
'attr' => [ "�attr1", ...
]
},...]
- Add the process service related details to the chef role to which the process corresponds to.
"jmxtrans": {
"servers": [
{
"type": 'process',
"service": '�process-service-name'
"service_cmd": '�string to identify service PID'�
},
]
}
If the role already includes other process (servers) details, add the new java process details to the servers array. The key value pairs are, type
key stores the process string which uniquely identifies the new Java process. The service
key stores the string which is used to start
, stop
the new java process using the service
command. THe service_cmd
key stores the string which can be used to identify the PID of the process when it is running on the cluster node. As you may have guessed, the type
will be used to identify the jmx port of the process, the service
value will be used to issue service
command and service_cmd
will be used to identify the process PID and start time.
Once these changes are in place along with the new java process installed on the cluster, JMXTrans will start collecting jmx statics and send it to the Graphite database installed on the cluster.
- If actions need to be taken based on the JMX stats this is accomplished by populating the stat into Zabbix so that triggers can be generated based on predefined conditions. Since data is stored in Graphite, Zabbix agents are not used to collect data. Instead data of interest is send from Graphite into Zabbix. Inorder to move data from Graphite to Zabbix and create the required trigger conditions, the following need to be added to the
bcpc-hadoop
cookbook default attribute file
default["bcpc"]["hadoop"]["graphite"]["queries"] = {
'process' => [
{
'type' => "jmx",
'query' => "memory.NonHeapMemoryUsage_committed",
'key' => "hbasenonheapmem",
'trigger_val' => "max(61,0)",
'trigger_cond' => "=0",
'trigger_name' => "HBaseMasterAvailability'�,
'trigger_enable' => 0,
'trigger_dep' => []
'history_days' => 2,
'trend_days' => 30
},
],
}
process
is the string which uniquely identifies the java process. This will be used to create the host
in Zabbix.
type
value should be set to jmx since that is the only type currently supported.
query
the Graphite query which need to be executed to retrieve data from its database.
key
value should be the string which need to be used to create the Zabbix item
.
trigger_val
value will be used to identify the data to be used to generate a trigger in Zabbix. In the example the maximum value in the past 61 seconds will be used by Zabbix to check whether a trigger need to generated.
trigger_cond
value is the condition which need to be satisfied to generate a Zabbix trigger.
trigger_name
value will be used to create the trigger item in Zabbix.
trigger_dep
value is an array of trigger names on which this trigger depends on.
history_days
value deteremines the number of days the data for this trigger item will be stored in Zabbix before it gets purged.
trend_days
value determines the number of days the trending data will be stored in Zabbix.