Skip to content

LeoGori/HMM4GA

Repository files navigation

Hidden Markov Models for Genome Analysis

The project's goal is the development of a basic implementation of the pair hidden Markov Model (HMM) forward algorithm for genomic sequence analysis (described in [1]), with the introduction of concurrent computation through the use of OpenMP APIs.

Further details are provided in these articles ([2], [3]) and on the book Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (page 88, §4.2).

Code Description

The main files that build up the project are:

  • Sequence.h: class that represents the sequence of nucleotides, it contains the string of characters that compose the sequence and the class SequenceGenerator
  • SequenceGenerator.h: class that defines a random emission probability distribution of a sequence of nucleotides. Currently, an instance of the class Sequence is randomly generated according to its SequenceGenerator.
  • ProbabilityMatrix.h: class that represents a generic matrix of floating point values, from which the classes DynamicMatrix and StateTransitionMatrix inherit common attributes and methods. DynamicMatrix adds the possibility of adding rows and columns dynamically, while StateTransitionMatrix provides a series of states, and a mapping between them and the indexes of the matrix
  • PairHMM.h: the class that implements the pair HMM forwarding algorithm, it encloses 2 instances of the class Sequence (one for defining the read sequence, and one for defining the haplotype sequence), 1 instance of the class StateTransitionMatrix (for defining matrix T), and 3 instances of the class DynamicMatrix (for the definition of matrices M, I and D)
  • main.cpp: the entry point of the program, contains an instance of the class PairHMM and the call of its method for the execution of the PairHMM forwarding algorithm

Language and APIs

The code is entirely written in C++ programming language, with the use of the following libraries and APIs (omitting the standard ones):

  • random: used for the random generation of sequences and the random definition of state transition probabilities
  • algorithm: used for the shuflling of sequences, used for randomization purposes
  • OpenMP: used in PairHMM.cpp for introducing thread level computation in the algorithm

How to run the code (Windows)

  1. Install MinGw64 version > 9.2 (otherwise the random generated sequence will be the same at each execution, as reported here and here)
  2. Install CMake
  3. Create folder for building project
  mkdir build
  cd build
  1. Generate the makefiles
  cmake -G “MinGW Makefiles” ..
  1. build the project
  cmake --build .
  1. run the program
  ./HMM4GA.exe

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published