Skip to content

Tooling to deploy an Apache Spark performance dashboard. Run this as a standalone Docker container or install the helm chart on Kubernetes.

License

Notifications You must be signed in to change notification settings

xonai-computing/spark-dashboard

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Overview

For documentation see parent project https://github.com/cerndb/spark-dashboard

Install

git clone [email protected]:xonai-computing/spark-dashboard.git

cd spark-dashboard/dockerfiles

docker build -t spark-dashboard:v01 .

docker run -p 3000:3000 -p 2003:2003 -p 8086:8086 -d spark-dashboard:v01

You should have

  • Grafana on http://localhost:3000
    • Username admin
    • Password admin
    • Skip password change
  • InfluxDB graphite on 2003
  • InfluxDB http on 8086

Dashboards

Available at http://localhost:3000/dashboards

Spark_Perf_Dashboard_v03

Displays basic metrics

spark-shell \
  --master local[1] \
  --conf "spark.metrics.conf.*.sink.graphite.class"="org.apache.spark.metrics.sink.GraphiteSink" \
  --conf "spark.metrics.conf.*.sink.graphite.host"="localhost" \
  --conf "spark.metrics.conf.*.sink.graphite.port"=2003 \
  --conf "spark.metrics.conf.*.sink.graphite.period"=1 \
  --conf "spark.metrics.conf.*.sink.graphite.unit"=seconds \
  --conf "spark.metrics.conf.*.sink.graphite.prefix"="xonai" \
  --conf "spark.metrics.conf.*.source.jvm.class"="org.apache.spark.metrics.source.JvmSource" \
  --conf spark.metrics.appStatusSource.enabled=true \

After running this query more metrics should appear

sql("select count(*) from range(1000) cross join range(1000) cross join range(100)").count

Spark_Perf_Dashboard_v03_with_annotations

Includes annotations of:

  • Jobs
  • Queries
  • Stages
  • Tasks

These metrics are collected using sparkMeasure library. See more details here. Add to the previous command:

spark-shell \
  ...
  --packages ch.cern.sparkmeasure:spark-measure_2.12:0.17 \
  --conf spark.sparkmeasure.influxdbURL="http://localhost:8086" \
  --conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSinkExtended \

In case spark-shell can't find some dependency run

mvn org.apache.maven.plugins:maven-dependency-plugin:3.2.0:get -Dartifact=<ORG>:<ARTIFACT>:<VERSION>

About

Tooling to deploy an Apache Spark performance dashboard. Run this as a standalone Docker container or install the helm chart on Kubernetes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Dockerfile 45.9%
  • Mustache 41.9%
  • Shell 12.2%