Skip to content

Template scripts for getting jobs to run on Dartmouth's "Discovery" cluster

Notifications You must be signed in to change notification settings

tehutbiru/cluster-tools-dartmouth

 
 

Repository files navigation

  ____ ____  _        ____ _           _             _______             _
 / ___|  _ \| |      / ___| |_   _ ___| |_ ___ _ __ |__   __| ___   ___ | | ___
| |   | | | | |     | |   | | | | / __| __/ _ \ '__|   | |   / _ \ / _ \| |/ __|
| |__ | |_| | |___  | |___| | |_| \__ \ ||  __/ |      | |  | (_) | (_) | |\__ \
 \____|____/|_____|  \____|_|\__,_|___/\__\___|_|      | |   \___/ \___/|_||___/

This toolbox contains a simple setup for deploying jobs on Dartmouth's high-performance computing clusters (Discovery, Ndoli, etc.)

To run the main analysis, use:

python supereeg_submit.py

If run on Discovery, it'll submit a batch of jobs to run in parallel. If run on a personal computer it'll run each job in sequence.

NOTE: jobs have not been implemented yet

=======

Authors: Paxton C. Fitzpatrick and Jeremy R. Manning ([email protected]) Created: October 16, 2016 Updated: December 16, 2019

This repository provides a set of tools for submitting jobs on Dartmouth's Discovery and Ndoli computing clusters. With minimal modification, they may be adapted to work most cluster computing environments.

The tools are intended to be used to process large datasets (e.g. one piece at a time) or to run analyses with many parameter combinations (e.g. one combination at a time), or to perform other tasks that may be divided into many "bite-sized" pieces.

Each piece of your task will become associated with a single BASH script, automatically generated by these tools. That BASH script, when run, will execute your job command(s) and run an analysis. The full set of tools generates a set of bash scripts and either submits each script to the PBS scheduler (if run from Discovery or Ndoli) or runs each script in serial (if run from another computer).

You will need to modify three scripts to run your analysis:

1.) config.py. This script defines the parameters that will be used to run your job. Only the code in the indicate section should be modified. Specifically, you will need to specify the following:

  • scriptdir: A directory for storing the BASH scripts (the directory is automatically created if it doesn't exist)
  • lockdir: A directory for keeping track of which scripts have already been submitted to the scheduler. The directory is created if it doesn't exist, and if it didn't exist previously then it is destroyed after all jobs have been submitted.
  • jobname: A string specifying the name given to your jobs (all jobs will have the same name).
  • q: The scheduling queue your job should be submitted to (one of: default, testing, or largeq). For more info see techdoc.dartmouth.edu/discovery
  • nnodes: The number of nodes to be allocated to each job
  • ppn: The number of processors per node to be allocated to each job
  • walltime: The maximum running time of your job
  • startdir: The working directory to start the job in (created if it doesn't exist)
  • cmd_wrapper: The program to be used to run your job (e.g. "matlab", "python", "sh", etc.
  • modules: A list of modules that need to be loaded prior to executing your job.

******** NOTE ******** Steps below are outdated

2.) create_and_submit_jobs.py. This script is what you'll run to actually create your job scripts and submit (or run) them. You'll want to modify the code in the indicated section to point to your job script and call any arguments you need to indicate to each job script which parameters or piece of the dataset should be considered. You need to specify the following:

  • job_commands: A list of commands for running each job. Each element of the list should be a string with the job's script name and any arguments that should be passed to the script. For example, the sample code runs "test.py" in ten instances, where each instance is passed a number from 0 -- 9.
  • job_names: A list of file names for the scripts. Each element of the list should be a string ending in ".sh". The script directory will automatically be appended.

3.) Your job script. An example script (test.py) is provided for reference. You will likely need to write a wrapper function that calls a series of analyses with the given arguments. Or you can have your job script itself perform the analysis. The job script should save any results that need to be referenced later (e.g. by writing files to disk, generating figures, etc.). The job script may be written in any programming language; the language should be specified in config['cmd_wrapper'].

After modifying the scripts as described above, you can run your jobs using: python create_and_submit_jobs.py

About

Template scripts for getting jobs to run on Dartmouth's "Discovery" cluster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%