-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault on compacted DBG #52
Comments
Definitely looks like a bug. Are you able to provide the input data so I can reproduce it? |
bifrost_10_graph.gfa.gz |
I've implemented a solution in the master branch, so if you rebuild you should be able to run without crashing. The solution is a bit of a hack though. Probably a better way to avoid this problem is to rebuild the DBG with an odd-length overlap. This is a common technique that prevents overlaps from being DNA palindromes (because the middle base always changes during reverse complementation). These palindromic overlaps were the fundamental cause of the bug. |
Thank you very much for your help, I spent some time resolving the error during the installation of GetBlunted and I'm sorry to say this issue is not solved yet. @jeizenga First, when I ran compacted DBG(cDBG) built with 10 genomes, it does work, but when I ran cDBG built with 100 genomes, it still reports this error. And probably I will run cDBG built with thousand of bacterial genomes. The data (~130M) cDBG with 100 genomes: https://drive.google.com/file/d/1Vg2Sq0Zja-xdrhscZCOYRxqRV5_-DiuR/view?usp=sharing. You can have a try. Second, I'm very confused about building DBG with odd-length overlaps. Normally, DBG is built with odd kmer to avoid palindromes, so the overlap should be the kmer length minus 1, whose length should be an even number. I mean, in general, palindromes appear when kmer is even. For cDBG, due to compression, nodes or say untigs become arbitrary lengths, palindromic situations are inevitable. In this case, using an even-length kmer(odd-length overlap) would be very complicated and perhaps become a big problem. Third, assuming you are right, although I do not agree. I built a cDBG with kmer 30(that is overlap length 29) using 100 genomes. It still report a error, but it is not the same as the previous.
|
Hi Wenhai, we are still looking into this. However, I am wondering what your intended usage of the bluntified graph is. Getblunted works by duplicating sequence to ensure that no new adjacencies are produced. In a de Bruijn graph with 1000 samples, you are going to have a large amount of variation/error and densely overlapping nodes. Bluntification is still going to result in a very fragmented and sequence-redundant graph. Here is how your graph looks when attempting to do a 2d layout in Bandage: Methods that are intended for conventional sequence graphs are not exactly going to thrive in this environment. The bug you are seeing in the latest output is a result of attempting to stitch overlapping overlaps, however I cannot reproduce your result because you haven't shared the k30 graph. If you share it, I can take a look at that error. My guess is that there are chains of overlapping overlaps, which are a bit outside of the scope I tested for. |
Oh, thank you very much for your sharing layout in Bandage. Perhaps due to server configuration issues, I can't get the layout in Bandage.
I'm going to map short reads to the bluntified graph with
Let me correct it, my graph is built with 100 genomes.
Do you mean that compacted de bruijn is not suitable to blunt?
I'm more than happy to share it. https://drive.google.com/file/d/1Akb9te6dDzSWIb43LGrPareJisGuXEik/view?usp=sharing I have read your paper, and I try to run gimbricate/seqwish pipline and stark. The stark also report segmentation fault. The gimbricate/seqwish pipline can work well but it add many P lines which are not actual genome path in gfa file and these path probably is not what I want. |
We're working on resolving this bug. However, I should caution you that For a similar project, we found that we got better results converting the GFA nodes into a FASTA and mapping with |
Thank you very much for this good advice. I will make it as a new approach to try. |
Hi, I try to use get_blunted(pre-compile) on graphs which are compacted de Bruijn graph from Bifrost. My system is centos7 with gcc 9.4.
I test all the graphs which construct with 2 or 10 or 100 bacteria genomes and they report error segmentation fault. Here I show the error ran on the graph constructed with 10 genomes.
The coredump file detail
Could you please help me out?
Thank you in advance!
The text was updated successfully, but these errors were encountered: