-
Notifications
You must be signed in to change notification settings - Fork 136
An example of real assembly
We use a human gut dataset (SRR341725) in this tutorial.
https://github.com/voutcn/megahit/releases/download/v1.2.1-beta/MEGAHIT-1.2.1-beta-Linux-static.tar.gz
tar zvxf MEGAHIT-1.2.1-beta-Linux-static
cd MEGAHIT-1.2.1-beta-Linux-static/bin/
./megahit --test # run on a toy dataset
cd ../..
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR341/SRR341725/SRR341725_[12].fastq.gz
MEGAHIT-1.2.1-beta-Linux-static/bin/megahit -1 SRR341725_1.fastq.gz -2 SRR341725_2.fastq.gz -o SRR341725.megahit_asm
When the assembly finishes, you will see a screen message like:
--- [Sat Aug 15 15:01:37 2015] Merging to output final contigs ---
--- [STAT] 70776 contigs, total 117999745 bp, min 200 bp, max 217183 bp, avg 1667 bp, N50 3897 bp
--- [Sat Aug 15 15:01:38 2015] ALL DONE. Time elapsed: 1386.712479 seconds ---
Assembled contigs are located in SRR341725.megahit_asm/final.contigs.fa.
Tips:
- Single-end reads could be specified by
-r
- Interleaved PE reads could be specified by
--12
-
-r
,--12
,-1
and-2
could be specified for multiple times to assemble multiple libraries; be careful that files in-1
and-2
should be paired and in order
Here we only demonstrate how to calculate the read coverage per contig and extract unassembled reads using BBMap and samtools. You may pass the contigs your own subsequent analyses.
4.1 Download & install BBMap and samtools:
# BBmap
wget http://downloads.sourceforge.net/project/bbmap/BBMap_35.34.tar.gz
tar zvxf BBMap_35.34.tar.gz
# samtools
wget https://github.com/samtools/samtools/releases/download/1.2/samtools-1.2.tar.bz2
tar xvjf samtools-1.2.tar.bz2
cd samtools-1.2 && make -j && cd ..
4.2 Align reads with bbwrap.sh
:
bbmap/bbwrap.sh ref=SRR341725.megahit_asm/final.contigs.fa in=SRR341725_1.fastq.gz in2=SRR341725_2.fastq.gz out=aln.sam.gz kfilter=22 subfilter=15 maxindel=80
Tips:
-
bbwrap
also accepts multiple read libraries. For example two PE libraries and one SE library can be aligned within=pe1_1.fq,pe2_1.fq,se.fq in2=pe1_2.fq,pe2_2.fq
-
kfilter
is the minimum length of consecutive exact matches for an alignment, which is set to k_min + 1 -
maxindel
limits the indel size -
subfilter
limits the number of mismatches for an alignment
4.3 Output per contig coverage to cov.txt
with pileup.sh
:
bbmap/pileup.sh in=aln.sam.gz out=cov.txt
4.4 Extract unmapped reads (SE to unmapped.se.fq
and PE to unmapped.pe.fq
):
samtools-1.2/samtools view -u -f4 aln.sam.gz | samtools-1.2/samtools bam2fq -s unmapped.se.fq - > unmapped.pe.fq