Automated workflow for small RNA sequence data

Snakemake workflow for processing small RNA-seq libaries produced by Illumina small sequencing kits.

Requirments

demultiplex fastq files in located in data directory. They need to be in the form {sample}_R1.fastq.gz
Snakefile shipped with this repository.
config.yaml shipped with this repository. It contains all parameters and settings to customize the processing of the current dataset.
samples.csv listing all samples in the data directory withoug the _R1.fastq.gz suffix. The first line is the header i.e. the work library. An example is shipped with this repository which can be used as a template.
Optionall: environment.yaml to create the software environment if conda is used.
Installation of snakemake and optionally conda
If conda is not used, bowtie, fastqc, samtools and deeptools need to be in the PATH.

The above files can be downloaded as a whole by cloning the repository (which requires git):

git clone https://github.com/seb-mueller/snakemake_sRNAseq.git

Or individually for example the Snakemake file using wget:

wget https://raw.githubusercontent.com/seb-mueller/snakemake_sRNAseq/master/Snakefile

creating conda environment

conda env create --file environment.yaml --name srna_mapping

activate

conda activate srna_mapping

To deactivate the environment, run:

conda deactivate

Update:

git pull
conda env update --file environment.yaml --name srna_mapping

Usage:

Navigate in a Unix shell to the base directory contains the files listed above plus the data directory including the data like int this example:

.
├── data
│   ├── test2_R1.fastq.gz
│   └── test3_R1.fastq.gz
├── config.yaml
├── environment.yaml
├── samples.csv
└── Snakefile

Then just run snakmake in base directory:

# the most basic usage
snakemake
# recommended: automatic conda managment in central location
snakemake --use-conda --conda-prefix ~/.myconda -p

useful parameters:

--cores max number of threads
-n dryrun
-p print commands
--use-conda
--conda-prefix ~/.myconda
--forcerun postmapping forces rerun of a given rule (e.g. postmapping)
--keep-going if for example one sample fails, pipeline will still try to process other samples

Output:

trimmed, log and mapped directory with trimming and mapping results.

Update: added STAR support

# create star index (goes in staridx folder)
snakemake -p --skip-script-cleanup staridx --cores 3
# then map using star
snakemake -p --skip-script-cleanup starmap --cores 3
# TODO: create bw files form STAR mapping

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
adapter_list_8bp.txt		adapter_list_8bp.txt
config.yaml		config.yaml
environment.yaml		environment.yaml
samples.csv		samples.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated workflow for small RNA sequence data

Requirments

creating conda environment

activate

Update:

Usage:

useful parameters:

Output:

Update: added STAR support

About

Releases

Packages

Languages

License

seb-mueller/snakemake_sRNAseq

Folders and files

Latest commit

History

Repository files navigation

Automated workflow for small RNA sequence data

Requirments

creating conda environment

activate

Update:

Usage:

useful parameters:

Output:

Update: added STAR support

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages