-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add conversion from SAM to BAM #63
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me on a first pass, and I really appreciate all the tests! I'm not so familiar with these record types, so I definitely want someone else to take a look, but a few other things to consider in addition to my in-line comments:
- Do any of these functions make sense to be exported? Eg parsing and encoding CIGAR seem like they could have broader utility.
- Does this really need to go in its own module?
src/convert.jl
Outdated
return vcat(key, typ, val) | ||
end | ||
|
||
error("Unsupported tag type $(Char(typ))") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add something with an unsupported tag and @test_throws
?
Super cool, I just used this used functionality @jonathanBieler to go from SAM to BAM! see https://gist.github.com/jelber2/47129820373474a768dacabc0e686fdc |
Great! Would be nice to get it merged in so it's a bit more accessible. @jakobnissen or @CiaranOMara - have you had a chance to take a looks? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, with a few comments.
The code allocates a lot, and I think it needs an API to convert a lot of records in bulk, sharing the same header. This might limit this particular implementation.
However, the API is the most important part. It can always be optimised later.
One more comment, @jonathanBieler - I believe it might be better if you implemented |
Co-authored-by: Jakob Nybo Nissen <[email protected]>
…use encode_nucleotide instead of BioSequences.encode
…ed BioAlignment import
I think I've addressed most of the issues. About performance I think the most common use case will be to write BAM records to a file, so maybe it would be possible to add a write method to |
I like that plan, but agree with @jakobnissen that getting the API right is the most important part of this PR. Performance can be fixed later, I think. @jakobnissen I'm going to let you pull the trigger on merge, |
Bump the version on a release branch. |
Ah, my bad - I didn't see that's how this repo is organized. Makes sense to me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my PoV (once @CiaranOMara's comment has been addressed - which ever way you prefer)
The Indexes.reg2bins
method is not relevant to SAM to BAM conversion.
Types of changes
This PR implements the following changes:
(Please tick any or all of the following that are applicable)
📋 Additional detail
This adds a method to
convert
to convert a SAM record to a BAM record, it will allow to use BWA as a library, create reads in-silico, etc. I'm testing it extensively so it should mostly be correct, but there's still some corner cases I think. I'll add some documentation later. Fix #61☑️ Checklist
docs/src/
.[UNRELEASED]
section of the manually curatedCHANGELOG.md
file for this repository.