-
Notifications
You must be signed in to change notification settings - Fork 13
Tutorial
HPG Aligner provides two commands to build index files for the reference genome. HPG Aligner is the fastest tool creating index, we use multicore to speed up this process. Depending on the size of reference genome, the index creation may need a lot of memory. The commands to build the index are:
- build-sa-index command: to create the index based on suffix arrays (SA)
- build-bwt-index command: to create the index based on the Burrows-Wheeler Tranform (BWT)
and two commmands to align reads into the reference genome depending on the input sequences:
- dna command: to map DNA sequences
- rna command: to map RNA-seq
The following sections describe the parameters used by each command.
./hpg-aligner build-sa-index
The command line options for the build-sa-index command are:
-g, --ref-genome=<file> Reference genome
-i, --bwt-index=<directory> SA directory name
./hpg-aligner build-bwt-index
The command line options for the build-bwt-index command are:
-g, --ref-genome=<file> Reference genome
-i, --bwt-index=<directory> BWT directory name
-r, --index-ratio=<int> BWT index ratio of compression
For alignment of DNA sequences:
./hpg-aligner dna
General options:
-f, --fq, --fastq=<file> Reads file input
-i, --bwt-index=<file> Index directory name
-o, --outdir=<file> Output directory
--prefix=<string> File prefix name
--bam-format BAM output format (otherwise, SAM format)
Pair mode options:
-j, --fq2, --fastq2=<file> Reads file input #2 (for paired mode)
--paired-min-distance=<int> Minimum distance between pairs
--paired-max-distance=<int> Maximum distance between pairs
Report options:
--report-all Report all alignments
--report-n-best=<int> Report the <n> best alignments
--report-n-hits=<int> Report <n> hits`
--report-only-paired Report only proper paired alignments
--report-best Report all alignments with the best score
-l, --log-level=<int> Log debug level
-h, --help Help option
Seeding options:
--num-seeds=<int> Number of seeds per read
Smith-Waterman options for the gap alignments:
--sw-match=<double> Match value for Smith-Waterman algorithm
--sw-mismatch=<double> Mismatch value for Smith-Waterman algorithm
--sw-gap-open=<double> Gap open penalty for Smith-Waterman algorithm
--sw-gap-extend=<double> Gap extend penalty for Smith-Waterman algorithm
--sw-min-score=<double> Minimum score for valid mappings
Architecture options:
--cpu-threads=<int> Number of CPU threads
--read-batch-size=<int> Batch size for reading
Post-processing options:
--indel-realignment Indel-based re-alignment
--recalibration Base quailty score recalibration
For alignment of RNA sequences:
./hpg-aligner rna
General options:
-f, --fq, --fastq=<file> Reads file input
-i, --bwt-index=<file> BWT directory name
-o, --outdir=<file> Output directory
-e, --ext=<file> File extend name
--bam-format BAM output format (otherwise, SAM format)
RNA-seq specific options:
--max-intron-size=<int> Maximum intron size
--min-intron-size=<int> Minimum intron size
--min-score=<int> Minimum score for valid mappings
--transcriptome-file=<file> Transcriptome file to help search splice junctions
Pair mode options:
--fq2, --fastq2=<file> Reads file input #2 (for paired mode)
--paired-min-distance=<int> Minimum distance between pairs
--paired-max-distance=<int> Maximum distance between pairs
Report options:
--report-all Report all alignments
--report-n-best=<int> Report the <n> best alignments
--report-n-hits=<int> Report <n> hits
--report-only-paired Report only proper paired alignments
--report-best Report all alignments with the best score
-l, --log-level=<int> Log debug level
-h, --help Help option
Seeding options:
--seed-size=<int> Number of nucleotides in a seed (only for BWT mode)
--min-cal-size=<int> Minimum CAL size
Smith-Waterman options for the gap alignments:
--sw-match=<double> Match value for Smith-Waterman algorithm
--sw-mismatch=<double> Mismatch value for Smith-Waterman algorithm
--sw-gap-open=<double> Gap open penalty for Smith-Waterman algorithm
--sw-gap-extend=<double> Gap extend penalty for Smith-Waterman algorithm
Architecture options:
--cpu-threads=<int> Number of CPU threads
--read-batch-size=<int> Batch size for reading