Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skyline transfer #36

Merged
merged 31 commits into from
Feb 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
725f091
fix: align mg assembly rules to expected outputs
Feb 1, 2024
d1d46e1
fix: sample sheet processing and run id searching fixes from bigsky t…
rroutsong Feb 2, 2024
74c84e8
fix: update resource config for bcl2fastq
rroutsong Feb 2, 2024
27c1e5e
feat: add execution context documentation
rroutsong Feb 2, 2024
020d289
feat: add execution docs to nav
rroutsong Feb 2, 2024
9f8dd83
fix: update execution docs for sbatch execution on biowulf
rroutsong Feb 2, 2024
56736e8
fix: temporarily use branch version of profiles
rroutsong Feb 2, 2024
4d39d40
feat: update execution documentation
rroutsong Feb 2, 2024
4c21fd1
fix: update regex for miseq run id compatibility
rroutsong Feb 2, 2024
1183c31
fix: update miseq regex use . instead of \N
rroutsong Feb 2, 2024
c3d2d20
fix: don't error if progressbar is not installed
rroutsong Feb 5, 2024
8e3e736
fix: process different read indicators in the sample sheet
rroutsong Feb 5, 2024
83cbb1d
fix: clean up requirements
rroutsong Feb 5, 2024
60095f6
fix: update profiles
rroutsong Feb 5, 2024
eaa0b8f
fix: fix cpu allocation bug, make it only possible to get 2 cpus per …
rroutsong Feb 5, 2024
5932bd7
fix: close sample sheet endedness bug
rroutsong Feb 6, 2024
15c6620
fix: rework resources for demuxing and qc of certain rules
rroutsong Feb 6, 2024
7985866
fix: correct overindentation
rroutsong Feb 6, 2024
7166ad1
fix: correct bugs with lscratch usage for fastqc rule
rroutsong Feb 6, 2024
ad0258a
fix: formatting
rroutsong Feb 7, 2024
bb1ac3b
fix: update biowulf scheduler interactions for lscratch
rroutsong Feb 7, 2024
e6d7ab6
feat: switch to main branch on submodule
rroutsong Feb 7, 2024
3beb9e3
fix: update dry-run snakemake args and refactor tmpdir handling in fa…
rroutsong Feb 8, 2024
9f77c59
fix: ignore test dry run output
rroutsong Feb 8, 2024
66778f3
fix: correct git ignore path
rroutsong Feb 8, 2024
606b446
fix: recursive gitignore
rroutsong Feb 8, 2024
dce0ff9
fix: correct github action issues with escape and snakemake args
rroutsong Feb 8, 2024
fae01fb
fix: remove -r from snakemake
rroutsong Feb 8, 2024
7de775a
fix: remove profile designation from dry run execution
rroutsong Feb 8, 2024
7ba731a
fix: run ci on snakemake 7.32.4 in parallel to stable version
rroutsong Feb 8, 2024
f50e75c
fix: addresssing incompatibility with snakemake>=8.0.0
rroutsong Feb 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 21 additions & 2 deletions .github/workflows/dryrun.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,25 @@ on:

jobs:
dry_run_paired_end:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
submodules: 'true'
- uses: docker://snakemake/snakemake:v7.32.4
- name: Dry Run with test data
run: |
docker run -h cn0000 -v $PWD:/opt2 -w /opt2 snakemake/snakemake:v7.32.4 /bin/bash -c \
"source get_submods.sh; pip install -r requirements.txt; ./weave run --sheetname paired_end.csv -s /opt2/.tests/paired_end -o /opt2/.tests/dry_run_out --local --dry-run /opt2/.tests/paired_end"
- name: View the pipeline config file
run: |
echo "Generated config file for pipeline...." && cat $PWD/.tests/dry_run_out/paired_end/.config/config_job_0.json
- name: Lint Snakefile
continue-on-error: true
run: |
docker run -e SNK_CONFIG='/opt2/.tests/dry_run_out/paired_end/.config/config_job_0.json' -v $PWD:/opt2 snakemake/snakemake:stable snakemake --lint -s /opt2/workflow/Snakefile -d /opt2/workflow || \
echo 'There may have been a few warnings or errors. Please read through the log to determine if its harmless.'
dry_run_latest:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
Expand All @@ -30,10 +49,10 @@ jobs:
- uses: actions/checkout@v2
with:
submodules: 'true'
- uses: docker://snakemake/snakemake:stable
- uses: docker://snakemake/snakemake:v7.32.4
- name: Dry Run with test data
run: |
docker run -h cn0000 -v $PWD:/opt2 -w /opt2 snakemake/snakemake:stable /bin/bash -c \
docker run -h cn0000 -v $PWD:/opt2 -w /opt2 snakemake/snakemake:v7.32.4 /bin/bash -c \
"source get_submods.sh; pip install -r requirements.txt;./weave run --sheetname single_end.csv -s /opt2/.tests/single_end -o /opt2/.tests/dry_run_out --local --dry-run /opt2/.tests/single_end"
- name: View the pipeline config file
run: |
Expand Down
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
logs
.tests/illumnia_demux/dry_run_out
.snakemake
site
output
.tests/dry_run_out/*
.tests/illumnia_demux/dry_run_out
.tests/dry_run_out
136 changes: 0 additions & 136 deletions .tests/dry_run_out/paired_end/.config/config_job_0.json

This file was deleted.

20 changes: 20 additions & 0 deletions config/skyline.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"sif": "/data/openomics/SIFs/",
"mounts": {
"kaiju": {
"to": "/opt/kaiju",
"from": "/data/openomics/references/weave/kaiju/kaiju_db_nr_euk_2023-05-10",
"mode": "ro"
},
"kraken2" : {
"to": "/opt/kraken2",
"from": "/data/openomics/references/weave/kraken2/k2_pluspfp_20230605",
"mode": "ro"
},
"fastq_screen" : {
"to": "/fdb/fastq_screen/FastQ_Screen_Genomes",
"from": "/data/openomics/references/weave/FastQ_Screen_Genomes",
"mode": "ro"
}
}
}
117 changes: 117 additions & 0 deletions docs/execution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
**weave** is capable of automatically distributing its pipeline jobs across a slurm cluster. The context for it's initial execution can be varied as well.

The context is also centrally related to the configuration and setup of a particular cluster. Right now weave is configured to work with NIH clusters **skyline**, **biowulf**, and **bigsky**.


Typical contexts of execution include:

# srun (real time execution) (non-interactive)

The **weave** pipeline can be triggered from a head node in a non-interactive fashion:

## Bigsky/Skyline

!!! Note
Dependency files for skyline and bigsky differ <br />
**Bigsky: `/gs1/RTS/OpenOmics/bin/dependencies.sh`** <br />
**Skyline: `/data/openomics/bin/dependencies.sh`**

```bash
source ${dependencies}
srun --export=ALL "weave run [keyword args] ${run_id}"
```

!!! Note
srun <a href="https://slurm.schedmd.com/srun.html#OPT_export">by default</a> exports all environmental variables from the executing environment and `--export=ALL` can be left off

## Biowulf

```bash
srun --export=ALL "module load snakemake singularity; weave run [keyword args] ${run_id}"
```

# srun (real time execution) (interactive)

## Bigsky/Skyline

!!! Note
Dependency files for skyline and bigsky differ <br />
**Bigsky: `/gs1/RTS/OpenOmics/bin/dependencies.sh`** <br />
**Skyline: `/data/openomics/bin/dependencies.sh`**

```bash
> # <head node>
srun --pty bash
> # <compute node>
source ${dependencies}
weave run [keyword args] ${run_id}
```

## Biowulf

```bash
> # <head node>
sinteractive
> # <compute node>
module purge
module load snakemake singularity
weave run [keyword args] ${run_id}
```

Biowulf uses environmental modules to control software. After executing the above you should see a message similar to:

> [+] Loading snakemake 7.XX.X on cnXXXX<br />
> [+] Loading singularity 4.X.X on cnXXXX<br />

# sbatch (later time execution)

## Bigsky/Skyline

### sbatch tempalte
```bash title="<b>bigsky-skyline sbatch template</b>"
#!/bin/bash
#SBATCH --job-name=<job_name>
#SBATCH --export=ALL
#SBATCH --time=01-00:00:00
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=1
#SBATCH --mem=8g
#SBATCH --output=<stdout_file>_%j.out
source ${dependencies}
weave run \
-s /sequencing/root/dir \
-o output_dir \
<run_id>
```

This above script can serve as a template to create an sbatch script for weave. Update the psuedo-variables in the script to suit your particular needs then execute using sbatch command:

```bash
sbatch weave_script.sbatch
```

## Biowulf

### sbatch tempalte
```bash title="<b>biowulf sbatch template</b>"
#!/bin/bash
#SBATCH --job-name=<job_name>
#SBATCH --export=ALL
#SBATCH --time=01-00:00:00
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=1
#SBATCH --mem=8g
#SBATCH --output=<stdout_file>_%j.out
module purge
module load snakemake singularity
weave run \
-s /sequencing/root/dir \
-o output_dir \
<run_id>
```

Same sbatch execution as bigsky/skyline.

```bash
sbatch weave_script.sbatch
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -109,5 +109,6 @@ nav:
- weave run: usage/run.md
- weave cache: usage/cache.md
- Installation: install.md
- Execution context: execution.md
- Reference: ref/reference.md
- License: license.md
5 changes: 1 addition & 4 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
pandas
requests
terminaltables
pyyaml
tabulate
progressbar
python-dateutil
10 changes: 8 additions & 2 deletions scripts/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
import subprocess
import json
import urllib.request
import progressbar
from argparse import ArgumentTypeError
from pathlib import Path
from urllib.parse import urlparse
Expand Down Expand Up @@ -81,7 +80,14 @@ def handle_download(output_dir, resource, protocol, url):
if protocol in ('http', 'https', 'ftp'):
info_download(f"Getting web resource {resource}...")
fnurl = Path(urlparse(url).path).stem
urllib.request.urlretrieve(uri, filename=Path(output_dir, fnurl), reporthook=DownloadProgressBar())
try:
import progressbar
urllib.request.urlretrieve(uri, filename=Path(output_dir, fnurl), reporthook=DownloadProgressBar())
except ModuleNotFoundError:
print('Downloading resources....')
urllib.request.urlretrieve(uri, filename=Path(output_dir, fnurl))
print('....done.')

elif protocol in ('docker'):
info_download(f"Getting docker resource {resource}...")
docker_tag = url.split('/')[-1]
Expand Down
Loading
Loading