Global Run-On Sequencing (GRO-Seq) pipeline for analyzing transcription activity of genes from engaged RNA polymerase.
- Using GRO-seq docker image
singularity pull docker://sandrejev/groseq:latest
Before running GROseq pipeline you will need to obtain genome(fasta), bowtie index (.bt2), chromosome sizes (.chrom.sizes) and annotation. For mm9, mm10 and hg19 these can be downloaded automatically with download
command. The only exception being
annotation *.bed file.
singularity exec -B `pwd` groseq_latest.sif download mm10
Run groseq pipeline. Keep in mind that annotation file (-a
flag) is created automatically for you from geneRef.gtf
singularity exec -B `pwd` groseq_latest.sif groseq -f AS-512172-LR-52456/fastq/AS-512172-LR-52456_R1.fastq -a mm10.refGene.bed -g ./mm10 -o "singularity1" --chromInfo mm10.chrom.sizes
You can create annotation file with different clipping at the start or end of the transcript using longest-transcript
command
singularity exec -B `pwd` groseq_latest.sif longest-transcript mm10.refGene.gtf.gz mm10.refGene.bed --clip-start=50
You can extract rpkm using extract-rpkm
command (done automatically as part of the pipeline)
singularity exec -B `pwd` groseq_latest.sif extract-rpkm -a mm10.refGene.bed -o AS-512172-LR-52456_R1 --clip-start=50
For convenience GRO-seq image contains a script that can be used to run the pipeline on LSF cluster
# Run GRO-seq pipeline on all *.fastq files in the folder
singularity exec -B `pwd` groseq_latest.sif lsf | bsub
# Run GRO-seq pipeline on all *.fastq files in the folder that match the pattern. Additionaly prefix the results with tag PREFIX
singularity exec -B `pwd` groseq_latest.sif lsf PREFIX --pattern "AS-512178" | bsub
singularity shell groseq_latest.sif
docker pull sandrejev:groseq
Before running GROseq pipeline you will need to obtain genome(fasta), bowtie index (.bt2), chromosome sizes (.chrom.sizes) and annotation. For mm9, mm10 and hg19 these can be downloaded automatically with download
command. The only exception being
annotation *.bed file.
docker run -v ${PWD}:/mount -u $(id -g ${USER}):$(id -g ${USER}) -it --entrypoint download groseq mm10
Run groseq pipeline. Keep in mind that annotation file (-a
flag) is created automatically for you from geneRef.gtf
docker run -v ${PWD}:/mount -u $(id -g ${USER}):$(id -g ${USER}) -it groseq -f AS-512172-LR-52456/fastq/AS-512172-LR-52456_R1.fastq -a mm10.refGene.bed -g ./mm10 -o AS-512172-LR-52456 --chromInfo mm10.chrom.sizes
You can create annotation file with different clipping at the start or end of the transcript using longest-transcript
command
docker run -v ${PWD}:/mount -u $(id -g ${USER}):$(id -g ${USER}) -it --entrypoint longest-transcript groseq mm10.refGene.gtf.gz mm10.refGene.bed --clip-start=50
You can extract rpkm using extract-rpkm
command (done automatically as part of the pipeline)
docker run -v ${PWD}:/mount -u $(id -g ${USER}):$(id -g ${USER}) -it --entrypoint extract-rpkm groseq -a mm10.refGene.bed -o AS-512172-LR-52456_R1
For convenience GRO-seq image contains a script that can be used to run the pipeline on LSF cluster.
# Run GRO-seq pipeline on all *.fastq files in the folder
docker run -v ${PWD}:/mount -u $(id -g ${USER}):$(id -g ${USER}) -it --entrypoint lsf groseq | bsub
# Run GRO-seq pipeline on all *.fastq files in the folder that match the pattern. Additionaly prefix the results with tag PREFIX
docker run -v ${PWD}:/mount -u $(id -g ${USER}):$(id -g ${USER}) -it --entrypoint lsf groseq PREFIX --pattern "AS-512178" | bsub
docker run -v ${PWD}:/mount -u $(id -g ${USER}):$(id -g ${USER}) -it --entrypoint bash groseq
To successfully build GRO-seq image first required libraries and files must be downloaded. This can be done by running following command
python download.py dependencies
To build Docker image you need to execute
docker build --squash --build-arg http_proxy="http://www.inet.dkfz-heidelberg.de:80" --build-arg https_proxy="http://www.inet.dkfz-heidelberg.de:80" --rm -t sandrejev/groseq:latest .
docker login
docker push sandrejev/groseq:latest
singularity pull docker-daemon:sandrejev/groseq:latest