Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite BAM auxiliary data #70

Closed
wants to merge 1 commit into from

Conversation

jakobnissen
Copy link
Member

@jakobnissen jakobnissen commented Mar 10, 2024

This is the first step towards improving the API and performance of XAM.jl.

This PR is not breaking. It introduces a new data type representing BAM auxiliary data. When this PR is done, it will introduce a new function that allows users to manipulate auxiliary data using this new type.
The old type may then be removed in a future, breaking version of BAM, if one such is ever to happen. So, if this PR lands, and I never get around to make larger changes to XAM, or we disagree on what to do, this PR should still be a win :)

The advantage of this new AUXData type over the old AuxData is:

  • It allows both reading and writing, whereas the old only allowed reading
  • It's faster
  • It's more type stable
  • It conforms more closely to the AbstractDict interface.

The design of this currently relies on there never being any extra noncoding bytes in BAM record's data vector. However, there currently sometimes are.
Why the change? In older versions of Julia, resizing vectors were always slow. So, significant time could be saved by having the vector be overly long, and then storing the true length as an integer inside your mutable struct. However, from Julia 1.11, simply resizing the vectors is now even faster :)

TODO

The new AUXData type also supports writing, conforms closer to the `AbstractDict`
interface, and is more efficient and type stable.
@kescobo
Copy link
Member

kescobo commented Mar 11, 2024

introduces a new data type representing BAM auxiliary data. When this PR is done, it will introduce a new function that allows users to manipulate auxiliary data using this new type.
The old type may then be removed in a future, breaking version of BAM, if one such is ever to happen.

Based on this description, this modification is strictly additive. To me, having an AuxData and an AUXData is confusing, and since we're still pre-1.0, I'd kinda prefer just making another breaking release (even though the last such release was in January).

Aside from the name change, how much is different between the old and new types?

@CiaranOMara
Copy link
Member

Don't changes in the length of BAM.Record.data need to be reflected in BAM.Record.block_size?

@CiaranOMara
Copy link
Member

Naively, once the span is found, couldn't splice! be used to adjust BAM.Record.data?

function adjust(record::Record, indices, newdata::Vector{Uint8})
    splice!(record.data, indices, newdata)
    record.block_size = recalculate_block_size(record.data)
end

@jakobnissen
Copy link
Member Author

This has been implemented as https://github.com/BioJulia/XAMAuxData.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants