Can I submit a subset of kmers from the reads? #30

richarddurbin · 2018-02-23T15:54:41Z

Hello Rayan et al,

I want the unitigs induced by a subset of the kmers from a read set. Is there any way to create a file of kmers that you can read into bcalm2, and then compact into unitigs?

In fact I want to pick out kmers that I think are single copy in a diploid, based on depth. So a simple way to do that would be for you to allow me to give a maximum as well as a minimum copy number for each kmer. I would then set those by looking at a histogram. But ultimately I would like the freedom to correct the kmer counts by GC content (of their parent read pairs), so it would be more general to allow me to pass you a set of kmers.

I guess I could make a fasta file out of them, with one kmer per sequence...

Thanks, Richard

richarddurbin · 2018-02-23T15:56:06Z

By the way, thanks for the algorithm and tool. It is very elegant.

rchikhi · 2018-02-23T17:04:38Z

Hi Richard, and thanks for the kind words!

For now, you can specify a minimum/maximum kmer abundance threshold using the options -abundance-min A -abundance-max B, noting that it will keep only kmers of abundance A<=x<=B.

Otherwise, the input formats of Bcalm are fasta/fastq reads, or a custom HDF5 file that contains counted k-mers. That format isn't that easy to create from scratch, but not impossible for us to maybe write a custom program for that. In that format, Bcalm expects that kmers are partitioned according to their minimizer.

Giving as input a fasta file with one kmer per sequence would be an acceptable solution, bearing in mind that abundance would be lost (unless the kmer is repeated as many times as its abundance, which isn't very elegant).

What's the volume of kmers that you wish to give to Bcalm: 10's of millions, billions?
Also, in what format do you have the counted k-mers right now?

amatur mentioned this issue Oct 16, 2019

Using bcalm2 on a set of k-mers where k is a multiple of 4 #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I submit a subset of kmers from the reads? #30

Can I submit a subset of kmers from the reads? #30

richarddurbin commented Feb 23, 2018

richarddurbin commented Feb 23, 2018

rchikhi commented Feb 23, 2018 •

edited

Loading

Can I submit a subset of kmers from the reads? #30

Can I submit a subset of kmers from the reads? #30

Comments

richarddurbin commented Feb 23, 2018

richarddurbin commented Feb 23, 2018

rchikhi commented Feb 23, 2018 • edited Loading

rchikhi commented Feb 23, 2018 •

edited

Loading