Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Static types #309

Draft
wants to merge 25 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
f531c5d
Replace ext/publishDir with params/publish definition
bentsherman Apr 27, 2024
836ace2
Update config to comply with strict parser
bentsherman Apr 27, 2024
25a1fb5
Use param schemas as source of truth, convert to YAML
bentsherman Apr 27, 2024
505806a
Use eval output, topic channels to collect tool versions
bentsherman Apr 27, 2024
5ae1562
Use static types, record types
bentsherman Apr 27, 2024
b2f563d
Refactor params as inputs for SRA workflow
bentsherman Apr 27, 2024
24b34cf
New dataflow syntax
bentsherman Apr 30, 2024
2771765
Simplify process inputs/outputs
bentsherman May 18, 2024
90e4ac1
Replace `def` with `fn` / `let` / `var`
bentsherman May 18, 2024
a7bebba
Omit name for single process output
bentsherman May 18, 2024
f791e1e
minor updates
bentsherman Sep 23, 2024
7040945
Revert changes to JSOn schemas
bentsherman Nov 2, 2024
aef604b
Add SraParams type
bentsherman Nov 3, 2024
f1f763c
Refactor workflow outputs
bentsherman Nov 3, 2024
c37a3b2
Make Sample type more precise
bentsherman Nov 4, 2024
677f838
Revert unrelated changes
bentsherman Nov 4, 2024
e4c956f
Remove old params, remove script params from config
bentsherman Nov 5, 2024
e4761ce
Replace set operator with assignment
bentsherman Nov 8, 2024
b2d817e
Update operators
bentsherman Nov 8, 2024
2b47166
Revert `fn` / `let` / `var` with `def`
bentsherman Nov 8, 2024
e3f052a
Revert pipe (`|>`) to dot for operators
bentsherman Nov 8, 2024
6598091
Revert unrelated changes
bentsherman Nov 20, 2024
d555922
Restore ext config
bentsherman Nov 20, 2024
adc45a4
Restore missing configs
bentsherman Nov 20, 2024
0b52c81
Revert unrelated changes
bentsherman Nov 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 0 additions & 13 deletions bin/multiqc_mappings_config.py

This file was deleted.

44 changes: 22 additions & 22 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,16 @@

process {

cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }

publishDir = [
path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
resourceLimits = [
cpus: params.max_cpus,
memory: params.max_memory,
time: params.max_time
]

cpus = { 1 * task.attempt }
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }

errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'
Expand All @@ -31,30 +31,30 @@ process {
// adding in your local modules too.
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:process_single {
cpus = { check_max( 1 , 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
cpus = { 1 }
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }
}
withLabel:process_low {
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
cpus = { 2 * task.attempt }
memory = { 12.GB * task.attempt }
time = { 4.h * task.attempt }
}
withLabel:process_medium {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 36.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
cpus = { 6 * task.attempt }
memory = { 36.GB * task.attempt }
time = { 8.h * task.attempt }
}
withLabel:process_high {
cpus = { check_max( 12 * task.attempt, 'cpus' ) }
memory = { check_max( 72.GB * task.attempt, 'memory' ) }
time = { check_max( 16.h * task.attempt, 'time' ) }
cpus = { 12 * task.attempt }
memory = { 72.GB * task.attempt }
time = { 16.h * task.attempt }
}
withLabel:process_long {
time = { check_max( 20.h * task.attempt, 'time' ) }
time = { 20.h * task.attempt }
}
withLabel:process_high_memory {
memory = { check_max( 200.GB * task.attempt, 'memory' ) }
memory = { 200.GB * task.attempt }
}
withLabel:error_ignore {
errorStrategy = 'ignore'
Expand Down
107 changes: 87 additions & 20 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -9,38 +9,62 @@
----------------------------------------------------------------------------------------
*/

nextflow.enable.dsl = 2
nextflow.preview.types = true

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS
IMPORT FUNCTIONS / MODULES / WORKFLOWS / TYPES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

include { SRA } from './workflows/sra'
include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_fetchngs_pipeline'
include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_fetchngs_pipeline'
include { SOFTWARE_VERSIONS } from './subworkflows/nf-core/utils_nfcore_pipeline'
include { DownloadMethod } from './workflows/sra'
include { SraParams } from './workflows/sra'
include { Sample } from './workflows/sra'

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAMED WORKFLOWS FOR PIPELINE
WORKFLOW INPUTS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

//
// WORKFLOW: Run main nf-core/fetchngs analysis pipeline depending on type of identifier provided
//
workflow NFCORE_FETCHNGS {
params {

take:
ids // channel: database ids read in from --input
// TODO: declare as Set<SraId> and construct SraId with isSraId()
input: Set<String> {
description 'Set of SRA/ENA/GEO/DDBJ identifiers to download their associated metadata and FastQ files'
}

main:
// TODO: declare as EnaMetadataFields and construct with sraCheckENAMetadataFields()
ena_metadata_fields: String {
description "Comma-separated list of ENA metadata fields to fetch before downloading data."
help "The default list of fields used by the pipeline can be found at the top of the [`bin/sra_ids_to_runinfo.py`](https://github.com/nf-core/fetchngs/blob/master/bin/sra_ids_to_runinfo.py) script within the pipeline repo. This pipeline requires a minimal set of fields to download FastQ files i.e. `'run_accession,experiment_accession,library_layout,fastq_ftp,fastq_md5'`. Full list of accepted metadata fields can be obtained from the [ENA API](https://www.ebi.ac.uk/ena/portal/api/returnFields?dataPortal=ena&format=tsv&result=read_run)."
icon 'fas fa-columns'
defaultValue ''
}

//
// WORKFLOW: Download FastQ files for SRA / ENA / GEO / DDBJ ids
//
SRA ( ids )
download_method: DownloadMethod {
description "Method to download FastQ files. Available options are 'aspera', 'ftp' or 'sratools'. Default is 'ftp'."
help 'FTP and Aspera CLI download FastQ files directly from the ENA FTP whereas sratools uses sra-tools to download *.sra files and convert to FastQ.'
icon 'fas fa-download'
defaultValue 'ftp'
}

skip_fastq_download: boolean {
description "Only download metadata for public data database ids and don't download the FastQ files."
icon 'fas fa-fast-forward'
}

dbgap_key: Path? {
description 'dbGaP repository key.'
help 'Path to a JWT cart file used to access protected dbGAP data on SRA using the sra-toolkit. Users with granted access to controlled data can download the JWT cart file for the study from the SRA Run Selector upon logging in. The JWT file can only be used on cloud platforms and is valid for 1 hour upon creation.'
icon 'fas fa-address-card'
}

// TODO: ...

}

Expand All @@ -52,6 +76,7 @@ workflow NFCORE_FETCHNGS {

workflow {

main:
//
// SUBWORKFLOW: Run initialisation tasks
//
Expand All @@ -61,29 +86,71 @@ workflow {
params.validate_params,
params.monochrome_logs,
args,
params.outdir,
params.input,
params.ena_metadata_fields
workflow.outputDir
)

//
// WORKFLOW: Run primary workflows for the pipeline
//
NFCORE_FETCHNGS (
PIPELINE_INITIALISATION.out.ids
samples = SRA (
Channel.fromList(params.input),
SraParams(
params.ena_metadata_fields,
params.download_method,
params.skip_fastq_download,
params.dbgap_key
)
)

//
// SUBWORKFLOW: Collect software versions
//
versions = SOFTWARE_VERSIONS()

//
// SUBWORKFLOW: Run completion tasks
//
PIPELINE_COMPLETION (
params.email,
params.email_on_fail,
params.plaintext_email,
params.outdir,
workflow.outputDir,
params.monochrome_logs,
params.hook_url
)

publish:
samples >> 'samples'
versions >> 'versions'
}

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
WORKFLOW OUTPUTS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

output {
samples: Sample {
path { _sample ->
def dirs = [
'fastq': 'fastq',
'md5': 'fastq/md5'
]
return { file -> "${dirs[file.ext]}/${file.baseName}" }
}
index {
path 'samplesheet/samplesheet.json'
sort { sample -> sample.id }
}
}

versions: Map<String,Map<String,String>> {
path '.'
index {
path 'nf_core_fetchngs_software_mqc_versions.yml'
}
}
}

/*
Expand Down
27 changes: 11 additions & 16 deletions modules/local/aspera_cli/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,23 @@ process ASPERA_CLI {
'biocontainers/aspera-cli:4.14.0--hdfd78af_1' }"

input:
tuple val(meta), val(fastq)
val user
meta : Map<String,String>
user : String

output:
tuple val(meta), path("*fastq.gz"), emit: fastq
tuple val(meta), path("*md5") , emit: md5
path "versions.yml" , emit: versions
fastq_1 : Path = file('*_1.fastq.gz')
fastq_2 : Path? = file('*_2.fastq.gz')
md5_1 : Path = file('*_1.fastq.gz.md5')
md5_2 : Path? = file('*_2.fastq.gz.md5')

topic:
( task.process, 'aspera_cli', eval('ascli --version') ) >> 'versions'

script:
def args = task.ext.args ?: ''
def conda_prefix = ['singularity', 'apptainer'].contains(workflow.containerEngine) ? "export CONDA_PREFIX=/usr/local" : ""
if (meta.single_end) {
def fastq = meta.fastq_aspera.tokenize(';')
if (meta.single_end.toBoolean()) {
"""
$conda_prefix

Expand All @@ -31,11 +36,6 @@ process ASPERA_CLI {

echo "${meta.md5_1} ${meta.id}.fastq.gz" > ${meta.id}.fastq.gz.md5
md5sum -c ${meta.id}.fastq.gz.md5

cat <<-END_VERSIONS > versions.yml
"${task.process}":
aspera_cli: \$(ascli --version)
END_VERSIONS
"""
} else {
"""
Expand All @@ -58,11 +58,6 @@ process ASPERA_CLI {

echo "${meta.md5_2} ${meta.id}_2.fastq.gz" > ${meta.id}_2.fastq.gz.md5
md5sum -c ${meta.id}_2.fastq.gz.md5

cat <<-END_VERSIONS > versions.yml
"${task.process}":
aspera_cli: \$(ascli --version)
END_VERSIONS
"""
}
}
12 changes: 0 additions & 12 deletions modules/local/aspera_cli/nextflow.config
Original file line number Diff line number Diff line change
@@ -1,17 +1,5 @@
process {
withName: 'ASPERA_CLI' {
ext.args = '-QT -l 300m -P33001'
publishDir = [
[
path: { "${params.outdir}/fastq" },
mode: params.publish_dir_mode,
pattern: "*.fastq.gz"
],
[
path: { "${params.outdir}/fastq/md5" },
mode: params.publish_dir_mode,
pattern: "*.md5"
]
]
}
}
27 changes: 0 additions & 27 deletions modules/local/multiqc_mappings_config/main.nf

This file was deleted.

9 changes: 0 additions & 9 deletions modules/local/multiqc_mappings_config/nextflow.config

This file was deleted.

24 changes: 0 additions & 24 deletions modules/local/multiqc_mappings_config/tests/main.nf.test

This file was deleted.

Loading
Loading