This section is presented as a lecture. A video link will be added after the course.
Below we have a brief summary:
Pipelines help automate a series of tasks, allowing better handling, consistency and traceability.
They are useful when running each step manually would be arduous.
Nextflow is primarily a language, that helps you scaffold a series of tasks into a modern reproducible workflow/pipeline. It provides a way to automate program downloads (using containers), passing of data through a pipeline (with its data flow programming model) and is modular which allows reuse of modules across any pipeline.
Check out the documentation here
Not only this, Nextflow has lots of other features that make it really useful:
Runtime: Has simple ways to run the pipelines across different infrastructure (on HPC, locally, AWS, etc.)
Version Control: Is integrated with Github and other version control software, for transparent coding and tracebility.
Unit testing: Has a way to test each module using nf-test unit testing, to help ensure all constituent parts don't fail
Community: It is a huge and diverse community all commited to build reproducible, effective and automated data pipelines (nf-core.io)
To learn more about Nextflow, there is excellent training material at https://training.nextflow.io/
Nextflow has a consistent structure that is built up of two main concepts:
Processes: a particular job/script/program (1 or multiple) that can have its own inputs/output and a script block. The building blocks of Nextflow, which can be seen as a module.
and
Channels: which carry the data from input or to output and also pass data between processes. Here, processes expect data to be in channels, so that Nextflow knows how to handle each data type.
A typical process looks like the following:
process TRIM {
input:
path 'sample.1.fastq' �
output:
path 'sample.1.trim.fastq'
script:
"""
trim_galore -o sample.1.trim.fastq sample.1.fastq
"""
}
In the above process:
we have a singular input, in this case a specific fastq file.
we have a singular output, in this case the output trimmed fastq file
we have a script scope, spearated by """
, that says what command to run
When you are ready:
Head to part 3 -> click here
or
Head back to menu -> click here