PSST

Polygenic SNP Search Tool Version 2.0

Graphical Overview

Overview:

The Polygenic SNP Search Tool is an open-source pipeline that identifies multiple SNPs that are associated with diseases; including SNPs that modify the penetrance of other SNPs. This pipeline identifies:

Asserted pathogenic SNPs
Genome-wide Association Studies (GWAS) identified SNPs , crossed with database and datasets such as ClinVar, SRA, and GEO, and then constructs a report describing multiple genetic variants associated with diseases.

Dependencies:

Biopython

Usage:

The main script psst.sh accepts as input a text file where each line corresponds to a unique SNP rs-accessions and either another text file containing unique SRA accessions or a FASTQ file. This script will then output a TSV file describing which SNPs are contained in the SRA datasets.


Usage: psst.sh [-h description and usage] [-s SRA accessions] [-n SNP accessions]
               [-f FASTQ file] [-d working directory] [-e email for Entrez]
               [-t threads] [-p max number of child processes]

Example: The PSST pipeline is as follows:

Extracts flanking sequences for the SNP accessions and creates a FASTA file containing these flanking sequences.
Creates a BLAST database out of the SNP flanking sequences.
Runs Magic-BLAST on each phenotype-associated SRA dataset and the SNP flanking sequence BLAST database.
From the Magic-BLAST alignments, determines which SNPs are contained in the SRA datasets using a statistical heuristic.

See the file breast-ovarian_cancer.tsv for an example output file.

Disease Clustering:

Grouping different disease types through the ClinVar database in various categories such as assorted metabolic diseases and breast cancer to see the relationship among human variations and phenotypes.

Diseases were manually found exploring through the ClinVar dataset.
Performed an online search to crosscheck whether the diseases that came up were metabolic or cancer related.
Those that were not a match were eliminated while the correct diseases were moved into another file.

Future Additions

Add a Bayesian inference variant calling rule for small number of NGS datasets. Our current heuristic runs fast on a large number of datasets, but for small number of datasets, a bayesian inference rule would be better and we wouldn't lose much in terms of time usage.

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
media		media
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
breast-ovarian_cancer.tsv		breast-ovarian_cancer.tsv
psst.sh		psst.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PSST

Graphical Overview

Overview:

Dependencies:

Usage:

Disease Clustering:

Future Additions

About

Releases

Packages

Contributors 4

Languages

License

NCBI-Hackathons/PSST

Folders and files

Latest commit

History

Repository files navigation

PSST

Graphical Overview

Overview:

Dependencies:

Usage:

Disease Clustering:

Future Additions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages