Skip to content

Releases: nanoporetech/tombo

Minor Optimizations and Bug Fixes

20 Feb 00:03
Compare
Choose a tag to compare

Various bug fixes and optimizations.

Motif-specific models and Level Testing

12 Oct 14:54
Compare
Choose a tag to compare

This release includes two major feature additions:

  • Motif-specific models introduced in this version provide more accurate alternative base models for improved methylation detection in E. coli and human sequence contexts. This release also allows user-friendly motif-specific model training for extension to highly accurate identifications of most all bacterial DNA modifications (tutorial). Motif-specific models show improved performance on E. coli methylation detection to AUC 0.9985 for 5mC and 0.9811 6mA (based on PCR sample comparison; up from 0.92 for 5mC and 0.79 for 6mA in previous Tombo version; mean average precision 0.9986 for 5mC and 0.9808 for 6mA).
  • Level sample comparison modified base detection (documentation and tutorial) allows for detection of modification differences between two samples of interest (i.e. neither sample needs to be canonical bases only, but both must come from the same reference genome/transcriptome). An example would be a gene knockout experiment where some native modifications remain. This method requires higher coverage (suggested minimum depth of 50X), but may provide more accurate results for certain applications, particularly for direct RNA applications.

In addition this release includes:

  • Added a lower minimum observations per base threshold for the re-squiggle step resolving skipped reference bases after dynamic programming. Results in improved signal assignment.
  • Bug fix to per-read statistics output
  • Various minor optimizations in re-squiggle command
  • Slightly improved re-squiggle performance

RNA, API and sample comparison prior

01 Aug 17:59
Compare
Choose a tag to compare

Version 1.4 release

Major technical changes

  • Improvements to RNA re-squiggle
    • RNA stall detection and masking/collapsing
      • Only affects events during dynamic programming (raw signal is never removed or ignored)
    • Improved signal normalization
      • Event-based
      • Adapter trimming
    • Updated RNA parameter tuning
      • Better detection of reads leaving reasonable signal to sequence matching path
  • Added canonical model prior for sample comparison modified base detection
    • This has improved modified base detection from an AUC of 0.91 and 0.84, for 5mC and 6mA respectively, to 0.99 and 0.94 on a relatively low coverage bacterial sample (~20X).

Official Tombo python API release (documentation here)

  • API includes 3 modules: tombo_helper, tombo_stats and resquiggle
  • Provides access to base mean and raw signal data from individual reads or across a genomic range
  • Provides access to per-read modified base detection statistics
  • Key objects include: readData, intervalData, TomboReads, TomboStats and PerReadStats

This release also includes various bug fixes including a change to the statistics file allowing processing of arbitrarily large genomes.

Parameter tuning, specifically the read start identification window size and adaptive bandwidth, should result in improved computational performance for the majority of runs (low quality data or poor reference quality may cause longer run times).

Signal normalization and outlier-robust modified base detection

23 May 21:15
Compare
Choose a tag to compare

Version 1.3 release

Major technical changes (these changes drastically improve modified base detection performance):

  • Improved signal normalization
    • Sequence-dependent robust signal normalization using Theil-Sen estimator
    • Iterative re-scaling
  • Outlier-robust alternative model modified base detection
    • Scaled log likelihood ratio to down-weight outlier raw signal assignment

Other technical updates:

  • Added RNA m5C model
  • Updated canonical RNA model to 180mV settings
    • 200mV still included in repository, but not accessible via command line
  • Added two-way thresholding to all modification detection methods for fraction modified estimation

Feature updates:

  • Increased read filtering capabilities
    • Added genome_locations, raw_signal_matching (refactored from resquiggle --signal-align-parameters) and q_score filters.
  • Re-factored Tombo commands into command groups
    • So plot_max_coverage is now plot max_coverage
    • test_significance was explicitly split into 3 commands under detect_modifications to avoid confusion over method used (e.g. detect_modifications de_novo)
  • Progress bars including estimated time to complete time consuming steps.

Many other computational optimizations to allow for easier processing of large data sets. Several bug fixes.

Minor python2 fixes

20 Mar 19:45
Compare
Choose a tag to compare

Some minor python2 bug fixes and addressing some minor error handling.

Per-read statistics and general optimization

13 Mar 00:49
Compare
Choose a tag to compare

This release includes new features for investigation of per-read, per-base modified base detection. Study of per-read statistic distributions has improved modified base detection in validation data sets by choosing better default per-read statistics thresholds. This version extends the use of the dampened-fraction of modified bases to better handle samples with variable coverage.

The release also includes some fixes for issues in the last version. The major user issues addressed are:

  • More efficient processing of large genomes, which previously resulted in very large memory usage

    • This addresses both computationally and in memory usage issues in the re-squiggle and test_significance commands.
  • Addressing issues specific to RNA processing: truncation of long transcript names and samples mapping to different sets of sequence records/transcripts

  • Better protection of read file corruption resulting from access by multiple, independent, concurrent Tombo commands

Improved re-squiggle and added 6mA model

13 Feb 00:40
Compare
Choose a tag to compare

Major Tombo update includes:

  • Added N6-methyladenosine alternative model (in all sequence contexts)
  • Vastly improved re-squiggle results
    • Event segmentation and event-to-signal assignment parameter tuning
    • Specific parameters for DNA and RNA
  • Per-read statistics output
    • Including access via Tombo python API and random access to reads from genomic locations
  • Cleaner user experience
    • Converted to using mappy (minimap2 python API) in place of command line executable options
    • python 3 compatibility (addresses rpy2 conda install issues, numpy/cython memory leak and python2's slow painful death)
    • Simplified command line options (including some hidden advanced options for user generated models)
    • Unified plotting options
    • Better error and warning messages
  • General computational optimization
  • Added dampened fraction wiggle output (for variable coverage regions)
  • ROC plot command for testing results at a known motif
  • pyfaidx indexed genome FASTA access (for processing large genomes on limited resources)
  • Several important and minor bug fixes

Pre-processing and minor fixes

21 Dec 22:42
Compare
Choose a tag to compare

The major update in this release is the addition of the annotate_raw_with_fastqs command. This command adds the basecalled sequence information from a set of FASTQs to the corresponding set of FAST5 raw read data files. This allows users to avoid the creation and storage of large basecalled FAST5 files containing Events information.

The documentation has also been updated including more complete and concise API documentation.

Additional minor bug fixes:

  • Minor bug in sam parser from bwa mem mappings (thanks to @rasto2211 for the bug report)
  • Re-factored Tombo event access (thanks to @JohnUrban for the bug report)
  • Fixed statistics only wiggle output (thanks to @JohnUrban for the bug report)
  • Fixed bug in genome sequence from reads extraction code (thanks to @JohnUrban for the bug report)

Modified Base Estimation Upgrade

15 Dec 01:08
Compare
Choose a tag to compare

The primary update in this version of Tombo is a major shift in the alternative base model estimation method. This method still uses the same practical sample type, but produces a much better model. On an E. coli test dataset AUC is improved from 0.77 to 0.89. This model is now included as the default 5-methyl cytosine model in Tombo.

Additional changes in this version:

  • Full package documentation
  • Better signal normalization
  • Filter reads to even read depth
  • Minor bug fixes

release-1.0

04 Dec 18:21
Compare
Choose a tag to compare

Initial public release of Tombo