Skip to content

sunbeam-labs/sbx_contigs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sunbeam Contigs extension

This is an extension to select assembled contigs based on the taxonomy annotation by parsing the blastn summary. For example, we are interested in get all the contigs annotated as a give taxon of interest (e.g. Escherichia coli), and later on assess the metagenomics assembled genomes (MAG) quality using checkM.

Installing

With you sunbeam conda environment activated, simply clone the extension directory into the sunbeam/extensions/ folder, installing requirements, and add new options to you existing configuration file:

 git clone https://github.com/sunbeam-labs/sbx_contigs.git extensions/sbx_contigs
 conda install --file extensions/sbx_contigs/requirements.txt
 cat extensions/sbx_contigs/config.yml >> sunbeam_config.yml

You also need to follow the instructions from the R package taxonomizr to download and build the database, and specify the path/to/the/db in the config.yml file.

Running

There are a few steps in this extension:

  1. Using taxonomizr, we can convert the accession number per each contig to taxaID, and further to taxonomic name. Then we can subset contigs by the given taxa of interest.

  2. Not all samples will have the taxa of interest, therefore we manually subset the sample.tsv, by using the reports.txt generated by contigs_by_taxa rule.

sunbeam run --configfile=sunbeam_config.yml contigs_by_taxa
  1. Now we can check the reports.txt to see the sample distribution and update the samples.csv to samples.contigs.csv.
grep -Fwf <(cut -f1 sunbeam_output/annotation/taxaName/reports.txt | uniq | sort -u) samples.csv > samples.${taxa}.csv
  1. Finally, we can get all the E. coli contigs
sunbeam run --configfile=sunbeam_config.yml _contigs_selected

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages