GitHub - tehutbiru/cluster-tools-dartmouth: Template scripts for getting jobs to run on Dartmouth's "Discovery" cluster

  ____ ____  _        ____ _           _             _______             _
 / ___|  _ \| |      / ___| |_   _ ___| |_ ___ _ __ |__   __| ___   ___ | | ___
| |   | | | | |     | |   | | | | / __| __/ _ \ '__|   | |   / _ \ / _ \| |/ __|
| |__ | |_| | |___  | |___| | |_| \__ \ ||  __/ |      | |  | (_) | (_) | |\__ \
 \____|____/|_____|  \____|_|\__,_|___/\__\___|_|      | |   \___/ \___/|_||___/

This toolbox contains a simple setup for deploying jobs on Dartmouth's high-performance computing clusters (Discovery, Ndoli, etc.)

To run the main analysis, use:

python supereeg_submit.py

If run on Discovery, it'll submit a batch of jobs to run in parallel. If run on a personal computer it'll run each job in sequence.

NOTE: jobs have not been implemented yet

=======

Authors: Paxton C. Fitzpatrick and Jeremy R. Manning ([email protected]) Created: October 16, 2016 Updated: December 16, 2019

This repository provides a set of tools for submitting jobs on Dartmouth's Discovery and Ndoli computing clusters. With minimal modification, they may be adapted to work most cluster computing environments.

The tools are intended to be used to process large datasets (e.g. one piece at a time) or to run analyses with many parameter combinations (e.g. one combination at a time), or to perform other tasks that may be divided into many "bite-sized" pieces.

Each piece of your task will become associated with a single BASH script, automatically generated by these tools. That BASH script, when run, will execute your job command(s) and run an analysis. The full set of tools generates a set of bash scripts and either submits each script to the PBS scheduler (if run from Discovery or Ndoli) or runs each script in serial (if run from another computer).

You will need to modify three scripts to run your analysis:

1.) config.py. This script defines the parameters that will be used to run your job. Only the code in the indicate section should be modified. Specifically, you will need to specify the following:

scriptdir: A directory for storing the BASH scripts (the directory is automatically created if it doesn't exist)
lockdir: A directory for keeping track of which scripts have already been submitted to the scheduler. The directory is created if it doesn't exist, and if it didn't exist previously then it is destroyed after all jobs have been submitted.
jobname: A string specifying the name given to your jobs (all jobs will have the same name).
q: The scheduling queue your job should be submitted to (one of: default, testing, or largeq). For more info see techdoc.dartmouth.edu/discovery
nnodes: The number of nodes to be allocated to each job
ppn: The number of processors per node to be allocated to each job
walltime: The maximum running time of your job
startdir: The working directory to start the job in (created if it doesn't exist)
cmd_wrapper: The program to be used to run your job (e.g. "matlab", "python", "sh", etc.
modules: A list of modules that need to be loaded prior to executing your job.

******** NOTE ******** Steps below are outdated

2.) create_and_submit_jobs.py. This script is what you'll run to actually create your job scripts and submit (or run) them. You'll want to modify the code in the indicated section to point to your job script and call any arguments you need to indicate to each job script which parameters or piece of the dataset should be considered. You need to specify the following:

job_commands: A list of commands for running each job. Each element of the list should be a string with the job's script name and any arguments that should be passed to the script. For example, the sample code runs "test.py" in ten instances, where each instance is passed a number from 0 -- 9.
job_names: A list of file names for the scripts. Each element of the list should be a string ending in ".sh". The script directory will automatically be appended.

3.) Your job script. An example script (test.py) is provided for reference. You will likely need to write a wrapper function that calls a series of analyses with the given arguments. Or you can have your job script itself perform the analysis. The job script should save any results that need to be referenced later (e.g. by writing files to disk, generating figures, etc.). The job script may be written in any programming language; the language should be specified in config['cmd_wrapper'].

After modifying the scripts as described above, you can run your jobs using: python create_and_submit_jobs.py

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
cluster_scripts		cluster_scripts
.gitignore		.gitignore
README.md		README.md
_helpers.py		_helpers.py
remote_submit.py		remote_submit.py
requirements.txt		requirements.txt
resubmit_failed.py		resubmit_failed.py
upload_scripts.py		upload_scripts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

tehutbiru/cluster-tools-dartmouth

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages