Skip to content

Subcommand: random alignment

Lucas Czech edited this page Jan 4, 2022 · 10 revisions

Create a random alignment with a given numer of sequences of a given length.

Usage: gappa simulate random-alignment [options]

Options

Input
--sequence-count Required. UINT=0
Number of sequences to create.
--sequence-length Required. UINT=0
Length of the sequences to create.
--characters TEXT=-ACGT
Set of characters to use for the sequences.
Output
--out-dir TEXT=.
Directory to write output files to.
--file-prefix TEXT
File prefix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.
--file-suffix TEXT
File suffix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data.
--compress FLAG
If set, compress the output files using gzip. Output file extensions are automatically extended by .gz.
--write-fasta FLAG
Write sequences to a fasta file.
--write-strict-phylip FLAG Excludes: --write-relaxed-phylip
Write sequences to a strict phylip file.
--write-relaxed-phylip FLAG Excludes: --write-strict-phylip
Write sequences to a relaxed phylip file.
Global Options
--allow-file-overwriting FLAG
Allow to overwrite existing output files instead of aborting the command.
--verbose FLAG
Produce more verbose output.
--threads UINT
Number of threads to use for calculations.
--log-file TEXT
Write all output to a log file, in addition to standard output to the terminal.

Description

The command creates a random alignment with a given number of sequences of a given length. The sequences are named with simple letter combinations, going a, ..., z, aa, ..., az, ba, .... The characters in the alignment sequences are randmonly chosen from the provided character set.

At least one of the output format option flags --write-fasta, --write-strict-phylip, and --write-relaxed-phylip has to be provided, but not both of the phylip formats at the same time. The output files are named random-alignment.fasta and random-alignment.phylip, respectively, potentially using the --file-prefix and --file-suffix if provided.

The differences between strict and relaxed phylip are as follows: Strict phylip is the original specification, which uses exactly the first 10 characters of a line to denote the name (filled with spaces if shorter), and requires the whole sequence to be in the rest of the (potentially very long) line. Relaxed phylip allows arbitrarily long names, separated by at least one white space from the actual sequence, and the sequence can be broken down into multiple lines.

Citation

When using this method, please do not forget to cite

Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070

Clone this wiki locally