For documentation see parent project https://github.com/cerndb/spark-dashboard
git clone [email protected]:xonai-computing/spark-dashboard.git
cd spark-dashboard/dockerfiles
docker build -t spark-dashboard:v01 .
docker run -p 3000:3000 -p 2003:2003 -p 8086:8086 -d spark-dashboard:v01
You should have
- Grafana on http://localhost:3000
- Username
admin
- Password
admin
- Skip password change
- Username
- InfluxDB
graphite
on2003
- InfluxDB
http
on8086
Available at http://localhost:3000/dashboards
Displays basic metrics
spark-shell \
--master local[1] \
--conf "spark.metrics.conf.*.sink.graphite.class"="org.apache.spark.metrics.sink.GraphiteSink" \
--conf "spark.metrics.conf.*.sink.graphite.host"="localhost" \
--conf "spark.metrics.conf.*.sink.graphite.port"=2003 \
--conf "spark.metrics.conf.*.sink.graphite.period"=1 \
--conf "spark.metrics.conf.*.sink.graphite.unit"=seconds \
--conf "spark.metrics.conf.*.sink.graphite.prefix"="xonai" \
--conf "spark.metrics.conf.*.source.jvm.class"="org.apache.spark.metrics.source.JvmSource" \
--conf spark.metrics.appStatusSource.enabled=true \
After running this query more metrics should appear
sql("select count(*) from range(1000) cross join range(1000) cross join range(100)").count
Includes annotations of:
- Jobs
- Queries
- Stages
- Tasks
These metrics are collected using sparkMeasure library. See more details here. Add to the previous command:
spark-shell \
...
--packages ch.cern.sparkmeasure:spark-measure_2.12:0.17 \
--conf spark.sparkmeasure.influxdbURL="http://localhost:8086" \
--conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSinkExtended \
In case spark-shell can't find some dependency run
mvn org.apache.maven.plugins:maven-dependency-plugin:3.2.0:get -Dartifact=<ORG>:<ARTIFACT>:<VERSION>