for most programs here, help can be accessed by asking for help at the command line:
Type python script_name.py -h
for how to use them.
This is an ever growing repository of tools which I have used/ using for the various projects I am involved in.
With in here you will find:
This was an early draft to try an identify single copy common regions within genomes which primers could be desinged for as an alternative metabarcoding region to ITS1. This is not under any further development.
This gets the upstream regions of a given gene set to help identify promoter regions. Used in: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0985-1 The genome of the yellow potato cyst nematode, Globodera rostochiensis, reveals insights into the basis of parasitism and virulence
Some scripts in here to identify top blast hits, filter BLAST hits based on a given phylum (or whatever tax id is given).
A collection of scripts to identify and plot pallindromes in a given geneome sequence. Please read the Word file in the folder if you want to know more.
Script to rename the fasta names and hints names for BRAKER gene prediction. For me, this failed if I did not do this myself.
A collection of scripts to convert file formats from one type to another.
A pipeline and collection of scripts to estimate the copy number of ITS1 regions (or any other given gene of interest) based on genomic read coverage
A metabarcoding clustering pipeline wrote in shell. This is a draft for the upcoming metapy.py pipeline (https://github.com/widdowquinn/THAPBI-pycits/tree/master).
Tool to post taxonomically annotate a DIAMOND blast output.
Tool to predict horizontal or lateral gene transfer.
Tool to split up a large fasta file in N smaller fasta files
Pipeline to identity and align domains of interest.
Tools for working with Illumina data
Tools and pipelines for transposon analysis in genomes.
Tool to refine the 5 prime start codon after Transdecoder has predicted the CDS from an RNAseq assembly
Under development
Pipeline to tests gene models and gain information as to how good they are. They is no one method to do this!
Program to produce N number of random sequences with the average length and average GC of a given database.