Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No joined contigs with greedy configuration #37

Closed
YannDussert opened this issue Nov 25, 2022 · 2 comments
Closed

No joined contigs with greedy configuration #37

YannDussert opened this issue Nov 25, 2022 · 2 comments

Comments

@YannDussert
Copy link

Hi,

First, thanks for developing Dentist!

I have an issue very similar to @oushujun in #22 : basically I am trying to scaffold contigs with another assembly, and I am using the greedy configuration file for that.

After running Dentist, no contigs are joined, and the gap-closed.closed-gaps.bed file is empty. I ran the lost-gaps.py script, which gives the following report:

"In this run of DENTIST 4837 potentially closable gaps were not closed. More details:

Hint: use DBshow -n workdir/[REFERENCE].dam | cat -n to translate contig numbers to FASTA
coordinates.

  • lost 4 in collect phase
    • lost 0 gap(s) because of insufficient number of spanning reads (--min-spanning-reads=1)
    • lost 4 gap(s) because a scaffolding conflict was detected
      • conflicting gap closings: 1890-5809 (1 reads), 1890-28348 (1 reads)
      • conflicting gap closings: 2063-6912 (1 reads), 2063-15318 (1 reads)
      • conflicting gap closings: 5927-21838 (1 reads), 6196-21838 (1 reads)
      • conflicting gap closings: 4971-27615 (1 reads), 23980-27615 (1 reads)
  • lost 4833 in process phase
    • skipped 1389 read pile ups because of errors
      • consensus failed (1274 times)
      • other (115 times)
    • skipped 3444 read pile ups because of --only=spanning
  • lost 0 in output phase
    • skipped 0 insertion(s) because of --max-insertion-error=0.1
    • skipped 0 insertion(s) because of --join-policy=contigs
    • skipped 0 extension(s) because of --min-extension-length=100"

Looking into the process phase logs, I found 1273 errors reading "consensus alignment is invalid" and 230 errors "process DASqv returned with non-zero exit code 1: DASqv: Average coverage is too low (< 4X), cannot infer qv's\n".

Should I change some parameters in the configuration file?

Best regards
Yann

@a-ludi
Copy link
Owner

a-ludi commented Dec 5, 2022

Hi Yann,

Short answer: I am afraid that DENTIST in its current state is not able to properly close gaps with another assembly. It requires a number of changes to the way it creates a gap closing sequence from the "reads" that belong to a gap.


Long answer: (more for myself in case I decide to tackle this issue)

One of these changes is actually a bug associated with "--allow-single-reads". If given, DENTIST should not attempt to call a consensus if there are insufficient number of reads (<4X).

Well, actually there should be no consensus calling at all when using a second assembly because that should be high-quality sequence already.

Also, currently DENTIST does the "consensus alignment" test only once after it found a consensus (or single read) but if contigs (rather than reads) are given, it should try to align each candidate contig until a valid one is found.

There is also the question of how to select "the best" contig from a set of contigs that span than same gap. Maybe the right thing to do would be to keep all contigs that appear to be valid assuming they are alternative alleles of the same locus. This would require adjustments not only in the process stage but also in the output stage which strictly requires non-branching scaffolds.

@a-ludi a-ludi closed this as completed Dec 5, 2022
@YannDussert
Copy link
Author

Hi,
Thanks for the reply! I'll try another tool in that case.

Best regards
Yann

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants