This is a KBase module generated by the KBase Software Development Kit (SDK).
You will need to have the SDK installed to use this module. Learn more about the SDK and how to use it.
You can also learn more about the apps implemented in this module from its catalog page or its spec file.
This KBase apps wraps the GTDB-Tk commandline tool created by Donavan Parks, Pierre-Alain Chaumeil, and Phil Hugenholtz of the Austrailian Centre for Ecogenomics
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes. It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the GNU General Public License (Version 3).
-
Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. 2019. GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics, btz848.
-
Parks DH, et al. 2018. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol., http://dx.doi.org/10.1038/nbt.4229
We also strongly encourage you to cite the following 3rd party dependencies:
- Matsen FA, Kodner RB, Armbrust EV. 2010. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11:538.
- Jain C, et al. 2017. High-throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries. bioRxiv, https://doi.org/10.1101/225342.
- Hyatt D, et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11:119. doi: 10.1186/1471-2105-11-119.
- Price MN, Dehal PS, Arkin AP. FastTree 2 - Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One, 5, e9490.
- Eddy SR. 2011. Accelerated profile HMM searches. PLOS Comp. Biol., 7:e1002195.
GTDB-tk v1.7.0 takes a minimum of 212 GB of memory and 51 GB of disk to run. Therefore it is generally not feasible to test in a CI environment (e.g. Travis-CI) or on a developer's personal computer.
As such there are two test modes supported - one that tests a subset of the code but requires minimal resources, and a full test suite.
The former is suitable for rapid development, but the full suite should always be run before releasing new code to production.
In this mode, the GTDB-tk application and any calls to KBase services or the SDK callback server are mocked and only unit tests are run. This mode is run in Travis-CI and the coverage reported by Coveralls is based on this suite of tests.
To run mocked mode, from the module root directory:
pipenv shell
make test-sdkless
If pipenv is not installed, do pip install pipenv
.
Running tests in this mode requires no reference data, very little memory, and takes a few seconds. However, it does not test the code in the Impl file.
In this mode, the GTDB-tk application is run and any KBase services in the specified environment are contacted as in a normal SDK test run. As this requires a token, this test mode cannot be run in Travis-CI for Github PRs.
To run full mode, first add your KBase developer token to test_local/test.cfg
. Then from the
module root directory run:
make # may be omitted after the first run
kb-sdk test
Running tests in this mode requires 51 GB of reference data, 212 GB of memory, and takes on the
order of an hour. It runs all the tests, which includes an integration test with a KBase Assembly
object. More tests should be added in the future. For KBase developers, the dev1
machine in Chicago
is a suitable place to run the full test suite and the reference data is already available in
the /kb/data/kb_gtdbtk
directory.
- Add integration tests for the other processed types.
- Experiment with assembly sizes in order to have GTDB-tk recognize genes. The current test data is too small for this.
- Add copy of a copy integration tests (requires at least one more token).
- Add failing integration test cases
To use this code in another SDK module, call kb-sdk install kb_gtdbtk
in the other module's root directory.
You may find the answers to your questions in our FAQ or Troubleshooting Guide.