Skip to content

Implement Checkpointing, improve nested insertion removal, and more!

Compare
Choose a tag to compare
@oushujun oushujun released this 27 Mar 04:48
· 105 commits to master since this release

Checkpointing is implemented!

Users can recover interrupted runs from a number of major checkpoints. This is particularly useful when running LTR_retriever on huge genomes (i.e., common wheat) and got interrupted (for example, the job is killed due to walltime limit). Use LTR_retriever -h for further information.

Remove nesting of entire LTR elements in library

Previous versions would remove nested insertion of solo LTRs. However, when a full element is nested in a library sequence, the internal region of the nesting element won't be removed, causing sequence mosaics and library redundancy. In this update, a new module is developed to clean up composite sequences caused by full-element nesting. This update was inspired by Mr. Robert Hubley's report.

The current version has a slight decrease of accuracy with a marginal gain of sensitivity. This is likely due to the removal of nesting sequences that may have slightly shifted the annotation dynamic of RepeatMasker. Nevertheless, there is no extra sequence added in this process, but removes up to 60% of library sequences (i.e., in common wheat) that are redundant due to nested full-element insertions.

Rice (MSUv7) v1.x v2.0 v2.5
Sensitivity 95.0% 95.3% 96.3%
Specificity 95.0% 94.6% 94.0%
Accuracy 95.0% 94.8% 94.5%
Precision 85.4% 84.5% 83.1%

Other updates

  1. Update README, no longer supports MGEScan_LTR due to the inability to run it on modern Linux platforms.
  2. Add an easy way (conda) to install dependencies.
  3. Fix a bug occurred when chromosome names are pure numbers.
  4. Improve the estimation of LTR age. Previous versions included InDels for divergence estimation, which would result in overestimation of LTR age. This version will only use SNPs, no indels, to compute LTR divergence and age.