- mmh3 hash library
- SimpleFastaParser from Bio.SeqIO.FastaIO
- tqdm progress bars
To create filtered datasets run Bacterial_pipeline_part1.py and Bacterial_pipeline_part2.py.
To use our pre-filtered datasets first run makeFolders.py then run pipeline_wrapper.py. All the code is encapsulated in there.