Skip to content

Latest commit

 

History

History
14 lines (8 loc) · 723 Bytes

README.md

File metadata and controls

14 lines (8 loc) · 723 Bytes

Predicting zoonotic potential with DNA sequences

For the current best models, select a model from the curr_models section and test it out for yourself with the metrics at the bottom of validate.ipynb.

**REMINDER: data was transformed in the way that GBM and XGBoost was fit in order for it to perform properly.

Also - some data is gitignored for the purposes of conserving storage - you should be able to generate the sequences folder, as well as downloading virome_contigs and virome_reads from: http://www.hli-opendata.com/Virome/

Used ncbi-genome-download for quick downloads from NCBI

ncbi-genome-download --taxids virus_tax_ids.txt viral

for dataset 1: ncbi-genome-download --taxids ids-0.txt viral --parallel 4