Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add conversion from SAM to BAM #63

Merged
merged 7 commits into from
Mar 23, 2024

Conversation

jonathanBieler
Copy link
Contributor

@jonathanBieler jonathanBieler commented Jul 3, 2023

Types of changes

This PR implements the following changes:
(Please tick any or all of the following that are applicable)

  • ✨ New feature (A non-breaking change which adds functionality).
  • 🐛 Bug fix (A non-breaking change, which fixes an issue).
  • 💥 Breaking change (fix or feature that would cause existing functionality to change).

📋 Additional detail

This adds a method to convert to convert a SAM record to a BAM record, it will allow to use BWA as a library, create reads in-silico, etc. I'm testing it extensively so it should mostly be correct, but there's still some corner cases I think. I'll add some documentation later. Fix #61

sam = "seq1\t81\tPhiX\t1051\t60\t70M\t=\t1821\t702\tTCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGAC\t*\tNM:i:0\tMD:Z:70\tMC:Z:70M\tAS:i:70\tXS:i:0"
record = XAM.SAM.Record(sam)

bam_record = convert(BAM.Record, record)

☑️ Checklist

  • 🎨 The changes implemented is consistent with the julia style guide.
  • 📘 I have updated and added relevant docstrings, in a manner consistent with the documentation styleguide.
  • 📘 I have added or updated relevant user and developer manuals/documentation in docs/src/.
  • 🆗 There are unit tests that cover the code changes I have made.
  • 🆗 The unit tests cover my code changes AND they pass.
  • 📝 I have added an entry to the [UNRELEASED] section of the manually curated CHANGELOG.md file for this repository.
  • 🆗 All changes should be compatible with the latest stable version of Julia.
  • 💭 I have commented liberally for any complex pieces of internal code.

Copy link
Member

@kescobo kescobo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me on a first pass, and I really appreciate all the tests! I'm not so familiar with these record types, so I definitely want someone else to take a look, but a few other things to consider in addition to my in-line comments:

  1. Do any of these functions make sense to be exported? Eg parsing and encoding CIGAR seem like they could have broader utility.
  2. Does this really need to go in its own module?

src/convert.jl Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated
return vcat(key, typ, val)
end

error("Unsupported tag type $(Char(typ))")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add something with an unsupported tag and @test_throws?

@kescobo kescobo added the enhancement New feature or request label Jul 6, 2023
@jelber2
Copy link

jelber2 commented Mar 7, 2024

Super cool, I just used this used functionality @jonathanBieler to go from SAM to BAM! see https://gist.github.com/jelber2/47129820373474a768dacabc0e686fdc

@kescobo
Copy link
Member

kescobo commented Mar 8, 2024

Great! Would be nice to get it merged in so it's a bit more accessible. @jakobnissen or @CiaranOMara - have you had a chance to take a looks?

src/convert.jl Outdated Show resolved Hide resolved
Copy link
Member

@jakobnissen jakobnissen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good, with a few comments.
The code allocates a lot, and I think it needs an API to convert a lot of records in bulk, sharing the same header. This might limit this particular implementation.
However, the API is the most important part. It can always be optimised later.

src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
src/convert.jl Outdated Show resolved Hide resolved
@jakobnissen
Copy link
Member

One more comment, @jonathanBieler - I believe it might be better if you implemented SAM.Record(::BAM.Record) instead of a method of Base.convert. The reason is that convert can be called implicitly, without the programmer's knowledge. This can lead to some problems when conversions happen unexpectedly - implicit copies, degraded performance and more.

@jonathanBieler
Copy link
Contributor Author

jonathanBieler commented Mar 20, 2024

I think I've addressed most of the issues. About performance I think the most common use case will be to write BAM records to a file, so maybe it would be possible to add a write method to BAM.Writer to write directly SAM.Record's and avoid allocations.

@kescobo
Copy link
Member

kescobo commented Mar 20, 2024

I like that plan, but agree with @jakobnissen that getting the API right is the most important part of this PR. Performance can be fixed later, I think.

@jakobnissen I'm going to let you pull the trigger on merge, one other thing that could be included here is bumping to v0.4.1 so that we can tag a release as soon as it's in.

@CiaranOMara
Copy link
Member

Bump the version on a release branch.

@kescobo
Copy link
Member

kescobo commented Mar 20, 2024

Ah, my bad - I didn't see that's how this repo is organized. Makes sense to me!

src/convert.jl Show resolved Hide resolved
Copy link
Member

@jakobnissen jakobnissen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my PoV (once @CiaranOMara's comment has been addressed - which ever way you prefer)

@CiaranOMara CiaranOMara dismissed their stale review March 23, 2024 16:00

The Indexes.reg2bins method is not relevant to SAM to BAM conversion.

@CiaranOMara CiaranOMara merged commit 2b15ada into BioJulia:develop Mar 23, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[WIP] convert SAM to BAM
5 participants