Releases: a-ludi/dentist
Releases · a-ludi/dentist
v4.0.0
[4.0.0] - 2022-09-02
Added
- include GIT commit in logs
- script to generate report on closed and unclosed gaps
- script
mask2bed
that converts Dazzler masks to BED files - JSON schema for config file
- debugging output to track down issue #31
Changed
- preserve original scaffold headers in output FASTA
- ensure unique scaffold IDs in
output
- output FASTA names/coords in AGP
- parallelized alignment filters
- fail early measure against bug in
libmaus2
Fixed
- added missing Python installation to Singularity image
- treat long FASTA lines graciously
- fixed rule
validate_dentist_config
- fixed install instructions Snakemake profile
- fixed bug with newer versions of Snakemake
- include Python files in GIT repo
- close open LAS file asap
- fixed JSON conversion of
AlignmentChain
- workaround for Phobos v2.099.0 bug
- fixed compiler error
- treat compiler warning
[3.0.0] - 2021-12-09
[3.0.0] - 2021-12-09
Added
- Conda packages
dentist
anddentist-core
- DENTIST's configuration may be in YAML format
- print summary of all commands with
dentist --commands
- user may select the maximum alignment error rate
- note on a known bug that prohibits using
::
in FASTA headers - online API documentation
- included the demo example into the main repo
- included JQ in the container for easy inspections
- minimal integration tests that cover the whole pipeline
Changed
- substantially extended code documentation
- improved documentation of
read-coverage
and friends - improved error message if no pile ups have been found
- using a fixed version for Containers to avoid caching issues
- renamed workflow parameter
max_threads
→threads_per_process
- keep assertions in production code
- allow empty LAS files for masking
- improved pre-push hook to reduce accidental errors
Removed
- Docker container; now building directly Singularity image
- outdated integration tests
- deprecated and unused code
- obsolete testing command
translocate-gaps
Fixed
- improved compatibility of pre-compiled binaries by using Conda package
- make alignments with more than 2^^32 local alignments work
- minor compatibility fixes in the container
- broken links in README
- replaced defintion list by simple list in README
[2.0.0] - 2021-06-21
Added
- list of all commandline options
- example for a greedy DENTIST configuration
- guide on how to release a new version of DENTIST (work in progress)
Changed
- release v1.0.1 contained breaking changes so this release updates to v2.0.0:
the changes to the workflow make it incompatible with old configuration files - moved Docker image to Ubuntu and reduced size
- improved compatibility of pre-compiled binaries by compiling on Ubuntu 16.04
- sort read IDs in
insertions.db
to make AGP and BED files comparable - allow
--min-*-coverage
indentist mask-repetitive-regions
to be zero - avoid confusing message about pre-fetching the Singularity image if possible
- updated README
Removed
- unused argument for
process-pile-ups
- replaced
Dockerfile.build-release
by regularDockerfile
Fixed
- fixed
ProtectedOutputException
bug that was listed in the Troubleshooting
section of the README - sort LAS files for daccord without chaining
- buffer overflow in
propagate-mask
- adjust
read-coverage
in example configuration to actual coverage in the
example dataset
[1.0.2] - 2021-04-26
Added
- provide pre-built binaries of DENTIST and all dependencies in release tarball
- included unit tests in Docker build
- github.io page
Changed
- Improved README a lot
- Updated dependencies
- Removed
LAcheck
from the workflow beacuse it is useless
(see issue 14)
Fixed
- Compiler error and deprecation warnings
v1.0.1 - 2021-02-22
Added
- A wonderful logo :-)
Changed
- Updated README and other docs
- Some jobs in the workflow are grouped to reduce the number of cluster jobs
- Workflow requires a minimum Snakemake version
- Ignoring unused parameter in
process-pile-ups
; will be removed in next
major release - Disentangled workflow configuration for better usability and less build time
for Sakemake's DAG
Removed
- Old documentation parts/details
Fixed
- Sporadically lost masked regions in mask homogenization
- Handling of cyclic scaffolds
- Overly strict handling of types in DENTIST's config file
- Several minor bugs
v1.0.0 - 2020-02-04
Added
- A Docker container! This means you can just
--use-singularity
with
Snakemake. - Workflow rule to just produce all the repeat masks (this is used in the
paper to calculate the repeat content of the assemblies) - Automatic validation of the closed gaps with an alignment of the reads
against a preliminary gap-closed assembly:- Added command
bed2mask
- Optionally write a BED file of closed gaps
- Added command
validate-regions
- Added interface for reading/writing Dazzler track extras which is
utilized to communicate the contig and read IDs betweenoutput
and
validate-regions
- Added command
- Extensively documented the example workflow config
./snakemake/snakemake.yml
- Local alignment chaining via command
chain-local-alignments
and internally - Using chaining to filter/improve pile up alignments
- Added possibility to revert CLI options via
--revert
- All multi-valued CLI options take their value from a comma-separated list
and/or by giving the same option multiple times - Added
full_validation
flag to workflow to keep the preliminary assembly
and validation results - Added
no_purge_output
flag to workflow to prevent the automatic skipping
of invalid gaps; this also will not trigger the validation if not requested
explicitly - Possibility to lazily read local alignments from
.las
file - Greatly improved performance of reading
.las
files by switching to binary
interface - Possibility to manually skip filling of gaps
DBdust
for improved sensitivity in alignments- Homogenized masks implemented via new command
propagate-mask
which
translates a given mask via an alignment from one DB/DAM to another. The
masks are propagated from the assembly to the reads and back to gain
sensitivity.
v1.0.0-beta.3 - 2020-07-23
Added
- Always skip file locking with environment variable
SKIP_FILE_LOCKING=1
v1.0.0-beta.2 - 2020-07-23 (public beta)
Added
- Allow use of environment variables in Snakemake workflow config
- Avoid appending to DBs by design
- Improved README:
- Advice on how to choose parameters
- Advice on how to run DENTIST with different read types
- Version information to dependencies
- Log level information to log messages
- More logging on failed gap closing
Changed
- Simplified usage of
--workdir
: no need to manually create
the designated directory - Improvements to close more gaps:
- Custom pre-consensus alignment filtering
- Add support sequence to cropped reads to ensure daligner finds alignments
- Allow cropping in masked region if necessary
- Selectively ignore repeat mask to allow post consensus alignments
- Increased sensitivity in pileup alignments by adding the bridging option
ofdaligner
- Select reference read for consensus by intrinsic QVs → better
consensus quality - Moved flag
--max-insertion-error
fromprocess
tooutput
stage so
trying different values becomes much faster - Automatically deduce trace point spacing in all places
- Faster check if
.las
files are empty → faster CLI options checking - Naming of temporary files for easier inspection
- Use
DBdust
for post consensus alignment - Produce
.db
for cropped pileups (temporary files) to makeDAScover
andDASqv
work - Removed
-I
option fromdaligner
calls (avoid useless alignment)
Fixed
- Several bugs in Snakemake workflow
- Significantly improved number of closed gaps
- Coordinates in AGP output
- Bug in procedure that identifies a good cropping position
- Error that caused
--proper-alignment-allowance
to have no effect by default
v1.0.0-beta.1] - 2020-03-17 (public beta)
Added
- post-consensus alignment and validation with new parameter
--max-insertion-error
- inserted sequences are highlighted by upper-case letters which can be
turned off with--no-highlight-insertions
- batch ranges may end with a
$
indicating the end of the pileup DB - some mechanisms for early error detection
- write duplicate contig IDs to contig alignment cache for easier debugging
- added support for complementary contig alignments in
check-results
- allow
.db
databases as reference - improved version reporting
- updated README with additional instructions
Changed
- integrated Snakemake workflow into a single file and removed "testing"
workflow - cropping and splicing of insertions:
- existing sequence is completely retained
- moved from
process-pile-ups
tooutput
- binary format of insertions DBs (breaking change) to gain more freedom
in later steps - splice sites are chosen based on the post-consensus alignments
- ambiguities in the alignment of reads are now detected globally
- weakly anchored alignments are discarded early in the filtering pipeline
- the self- and read-alignment-based masks are now computed separately
- coverage values may now be fractional
- improved README by adhering to Standard Readme
- better (error) reporting
- temporary files have more informative names
- many minor refactorings and extensions
Removed
- combined self- and read-alignment-based masking: old behvaiour can be copied
by using the--mask
parameter and supplying both masks to all commands
Fixed
- trying all possible reference reads for consensus in order to find a
non-failing reference - corrected insertion splicing in case of reverse-complement alignment of the
consensus - bug that caused
check-results
to discard all alignments in certain loci - added missing logic for cropped contigs in
getGapState
incheck-results
v0.0.1 (thesis version)
Added
- work-around for
damapper
bug - histograms generated by
check-results
include a column for.999
sequence identity check-results
optionally writes a detailed gap report
Changed
- simplified the coverage bounds interface of
mask-repetitive-regions
: only max-values and/or the read coverage are required - improve consensus quality by using
lasfilteralignments
to remove deteriorating local alignments - reduced value of
--min-reads-per-pile-up
to--min-spanning-reads
(default: 3) to better work with extremely low coverage - reduced the default minimum anchor length to 500
- fix & simplify score function for read alignments
- suppress generation of two las files for reads alignment in
generate-dazzler-options
- insertion DBs do not include information about existing contigs anymore which makes validation of results easier
- renamed
debug-graph
→debug-scaffold
for clarity - added logging for discarded pile ups
check-results
counts complementary alignments (inversions) as errors- remove leading/trailing gaps from all checks by
check-results
- improved quality of documentation