Skip to content

Latest commit

 

History

History
149 lines (108 loc) · 4.9 KB

reference.md

File metadata and controls

149 lines (108 loc) · 4.9 KB
layout
reference

Glossary - Data processing and visualization for metagenomics

{:auto_ids} adapters : Artificial sequences of small length that are attached to both ends of a biological sequence for methodological purposes.

Alpha diversity (α-diversity) : mean species diversity in a site at a local scale

Assembly (Metagenomics) : stitching together of individual DNA reads into more complex and complete objects (contig, scaffold), which could lead to the complete representation of a gene or an entire genome.

Beta diversity (β-diversity) : the extent of change in community composition, or degree of community differentiation, in relation to a complex-gradient of environment, or a pattern of environments

bin : Group of reads, contigs, or scaffolds hypotetically assigned to a individual genome.

binning : The process of agruping DNA sequences in accordance to intrinsic chacarteristics of the sequence.

contig : contiguous fragments of DNA sequence from an incomplete draft genome. The result of assembling reads

Envirnomnet (conda) : Is a directory that contains a specific collection of packages that the user installed

fasta (format) : A text-based format for representing biological sequences.

fastq : A file storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores

for loop : A loop that is executed once for each value in some kind of set, list, or range. See also: while loop.

GC-content : is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C).

gene : A sequence of nucleotides that contains the information to specify a trait.

genome : All genetic information of an organism.

Illumina (sequencing) : A technique used to determine the series of base pairs in DNA.

Lowest common ancestor (LCA) : is the lowest node that has all descendants of insterest in a tree.

k-mer : Are contiguous sequence of characters of length k contained within a biological sequence.

Mapping : The process of establishing the locations of a set of nucleotides on any set of biological information as reads.

Metabarcoding : Collection of a specific gene region of a set of organisms.

Metadata : Information concerning how the samples and data were treated.

Metagenome-Assembled Genomes (MAG) : A single-taxon assembly based on one or more binned metagenomes that has been asserted to be a close representation to an actual individual genome

Metagenomics (shotgun metagenomics) : collection of genomic sequences from various (micro)organisms that coexist in any given space.

Next generation sequencing (NGS) : Technology is used to determine the order of nucleotides in entire genomes or targeted regions of DNA or RNA that is characterized by its massively parallel processing.

Operational Taxonomic Unit (OTU) : A collection of sequences that have certain percentage of similarity and are thus classified into groups of closely related individuals.

quality control
: any process which removes problematic data from a dataset

quality (Phred) scores : Is an integer value representing the estimated probability of an error, i.e. that the base is incorrect

read(s) : DNA sequence from one fragment (a small section of DNA).

read quality : the assignation of the probability of an error in the sequencing of a determined read

sequencing (genomics) : the process of determining the nucleic acid sequence – the order of nucleotides in DNA

Species diversity : The number of different species that are represented in a given community.

taxonomic assignment : Method of determining that a specific sequence belongs to a recognized taxon at different levels of the classification of all life organisms (Phylum, Genus, and Species). This is usually done by comparing the sequence of interest against a set of reference sequences.

thread : A thread is the unit of execution within a process. A process (the execution of a program) can have anywhere from just one thread to many threads.

Oligotrophic (environment) : A space that offers low levels of nutrients.

PCR (polymerase chain reaction) : method used to rapidly make millions of copies of a specific DNA sequence.

rRNA (Ribosomal ribonucleic acid) : a type of non-coding RNA which is the primary component of ribosomes.

scaffold : A portion of the genome sequence reconstructed from sequence fragments. Scaffolds are composed of contigs and gaps.

Sequencing depth (coverage) : Is the number of unique reads that include a given nucleotide in the reconstructed sequence.

Species abundance : The number of individuals of each species inside the environment.

Species richness: : Number of different species in an environment.

while loop : A loop that keeps executing as long as some condition is true. See also: for loop.