Skip to content

kbaseapps/kb_gtdbtk

Repository files navigation

kb_gtdbtk

Build status (master): Build Status Coverage Status

This is a KBase module generated by the KBase Software Development Kit (SDK).

You will need to have the SDK installed to use this module. Learn more about the SDK and how to use it.

You can also learn more about the apps implemented in this module from its catalog page or its spec file.

GTDB-Tk

This KBase apps wraps the GTDB-Tk commandline tool created by Donavan Parks, Pierre-Alain Chaumeil, and Phil Hugenholtz of the Austrailian Centre for Ecogenomics

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes. It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the GNU General Public License (Version 3).

References

We also strongly encourage you to cite the following 3rd party dependencies:

Setup and test

Test modes

GTDB-tk v1.7.0 takes a minimum of 212 GB of memory and 51 GB of disk to run. Therefore it is generally not feasible to test in a CI environment (e.g. Travis-CI) or on a developer's personal computer.

As such there are two test modes supported - one that tests a subset of the code but requires minimal resources, and a full test suite.

The former is suitable for rapid development, but the full suite should always be run before releasing new code to production.

Mocked test mode

In this mode, the GTDB-tk application and any calls to KBase services or the SDK callback server are mocked and only unit tests are run. This mode is run in Travis-CI and the coverage reported by Coveralls is based on this suite of tests.

To run mocked mode, from the module root directory:

pipenv shell
make test-sdkless

If pipenv is not installed, do pip install pipenv.

Running tests in this mode requires no reference data, very little memory, and takes a few seconds. However, it does not test the code in the Impl file.

Full test mode

In this mode, the GTDB-tk application is run and any KBase services in the specified environment are contacted as in a normal SDK test run. As this requires a token, this test mode cannot be run in Travis-CI for Github PRs.

To run full mode, first add your KBase developer token to test_local/test.cfg. Then from the module root directory run:

make  # may be omitted after the first run
kb-sdk test

Running tests in this mode requires 51 GB of reference data, 212 GB of memory, and takes on the order of an hour. It runs all the tests, which includes an integration test with a KBase Assembly object. More tests should be added in the future. For KBase developers, the dev1 machine in Chicago is a suitable place to run the full test suite and the reference data is already available in the /kb/data/kb_gtdbtk directory.

Testing TODO

  • Add integration tests for the other processed types.
    • Experiment with assembly sizes in order to have GTDB-tk recognize genes. The current test data is too small for this.
  • Add copy of a copy integration tests (requires at least one more token).
  • Add failing integration test cases

Installation from another module

To use this code in another SDK module, call kb-sdk install kb_gtdbtk in the other module's root directory.

Help

You may find the answers to your questions in our FAQ or Troubleshooting Guide.