An array comprehension is a monolithic array construction that is as expressive as basic SQL by supporting a group-by syntax that allows us to capture many array computations in declarative form.
SAC translates array comprehensions to Scala code that calls Spark RDD operations whose functional arguments call the Scala's Parallel Collections library for multicore parallelism.
The SAC benchmarks were evaluated on SDSC Comet. The SBATCH shell script used to run the benchmarks on Comet is in tests/spark file comet.run. The log files generated by the scripts that contain the run times are run*.log in the same directory.
The cluster should support Slurm Workload Manager, Hadoop 2.*, and myhadoop.
You compile SAC, use mvn install
on the top directory.
Steps to run the scripts on Comet (or on any Slurm-managed cluster):
- Install Scala 2.12.
- Install Spark 3.0 on Hadoop 2.7.
- Change SCALA_HOME and SPARK_HOME in the SBATCH scripts to point to your installations.
- Execute the scripts using sbatch, eg,
sbatch comet.run
.