POA's distributed framework, leveraging CCTools' Makeflow and Workqueue, allows users to leverage hundreds to thousands of computing cores for parallel processing of large data processing tasks. The pipeline is run using a YAML file, which specifies processing steps run by the pipeline wrapper script (distributed_pipeline_wrapper.py
).
Comprehensive instructions for gantry field operations, from field preparation to phenotype information extraction, can be found here.
- Linux-based computer, cluster, or server
- Singularity
- iRODS
- Python
For more information on YAML file key/value pairs, click here.
For more information on arguments/flags, click here.
The POA workflow requires iRODS. Follow the documentation here to install iRODS.
If you are running POA on the UA HPC, iRODS is already installed so there is not need to reinstall it. Skip to section "Linux & Windows Subsystem for Linux 2 (WSL2) users", bullet # 3.
If you are running POA on the UA HPC, you will need to set up SSH keys to gain access to data transfer nodes (DTNs). To get SSH keys set up, follow the steps below here
The script distributed_pipeline_wrapper.py
is used to run the pipeline. This script downloads and extracts bundled test data, runs containers, and bundles output data.
On your computer/server, run the following command:
./distributed_pipeline_wrapper.py -d 2020-02-14 -y yaml_files/example_machinelearning_workflow.yaml
There are three options when running POA on HPC clusters: interactive, non-interactice, and Cron.
The pipeline can use a data transfer node to download data, which speeds up processing.
Interactive jobs should be run on tmux to enable a persistent connection. To install tmux on the UA HPC head node, follow the directions here.
You must first launch an interactive node using the following command on UA HPC Puma:
./shell_scripts/interactive_node.sh
Once the resources are allocated, run the following command to process data:
./distributed_pipeline_wrapper.py -hpc -d 2020-02-14 -y yaml_files/example_machinelearning_workflow.yaml
Data will be downloaded and workflows will be launched. You view progress information for a specific workflow using the mf_monitor.sh
script. For example, to view progress information for the first workflow, run:
./shell_scripts/mf_monitor.sh 1
To submit a date for processing in a non-interactive node, run:
sbatch shell_scripts/slurm_submission.sh <yaml_file>
For example:
sbatch shell_scripts/slurm_submission.sh yaml_files/example_machinelearning_workflow.yaml
Make sure to change the account
and partition
values as needed in the YAML file.
For modules requiring a larger number of cores (e.g., Megastitch in the stereoTop and flirIrCamera, and ps2Top), slurm_submission_large.sh should be used.
To schedule Cron jobs, follow the directions here.