A Series of Tools for Hadoop Metrics analysis
Extract specified metrics fields from Metrics log files in namenode and datanodes, and converts them into .csv table.
Acquire Metrics log files by tasks automatically
Collect Ceph's perf counter data in CSV table, which can be logged with collectd
, and extract specified fields
for master and slaves respectively
Compute the Pearson correlation coefficient R as well as coefficents of the linear regression equation y = _b_x + a of two groups of values X and Y which are extracted from specified CSV table.
A glue script that executes analyze.py
and correlation.py
sequentially, given Metrics data provided.
Retrieve Metrics data of a period of time from the HDFS cluster.
Prototype v1 - Deprecated for terrible fitting.
Prototype v2
Prototype v3
Generate LaTex code of a line diagram consisting of three groups of data X, y1, y2. Generated
code is based on pgfplots
package.
Generate LaTex code of a scatter plot consisting of two groups of data x and y.
Retrieve Hadoop Yarn tasks via its JMX API, presenting the task name, start time and end time.
Functions regarding colored outputs to the UNIX terminal. Supports xterm-256color and xterm-(16)color.
Import this to automatically go to iPython debugging shell when error occurs.
CSV class library for higher level operations.
However currently this class could only parse CSV files whose dimension is X by Y without defects and the first line should be header and the rest shall be numbers.
Compare equality of two float numbers
A function to compute linear regression related coefficients
Returns
A data structure that can kick out the existing earliest pushed element if a new element is pushed when its size has reached the set limit. Essentially it's a circular queue.
List of hostnames or IPs of slaves on which datanode and node managers run.
List of hostname or IP of the master machine where namenode and resource manager runs.
Let's use Python 3.6.2 (Or later). The earlier versions should go to the museum.
Let's throw away Python 2 anyway. They should be obsolete.