You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I haven't lost track of the existing issues, and will get around to resolving them!
However, I got sidetracked with a more fundamental issue: The BGZFStreams package is slow, and not well integrated with the rest of the Julia IO ecosystem. Kenta Sato, the author of BGZFStreams.jl realized this, and created CodecBGZF.jl three years ago, but it looks like that project has been abandoned, and Kenta is a busy guy, hard to reach these days. See also this comment from Kenta Sato
In fact, BGZFStreams is the major bottleneck of XAM.jl, to the extent that optimizing XAM does not make any practical difference because it is so bogged down by BGZFStreams. For that reason, I created CodecBZFG myself. It's faster, safer, and more generic than BGZFStreams. I'd expect XAM's BAM module to be around 2x faster with CodecBGZF, perhaps more.
It'll be a bit of a project, since XAM depends on Indexes, which also depends on BGZFStreams.
I'll make this issue here, and then hopefully get around to it the following few months. If you support this change, and feel like giving it a crack, please be my guest! It'll also be useful to have other eyes on CodecBGZF to make sure I didn't bork the interface.
The text was updated successfully, but these errors were encountered:
I had noticed what you're up to, and I think it's exciting.
I like the idea of using TranscodingStreams. I have started to update the lifted packages in the BioJulia ecosystem to use TranscodingStreams. It would be great to have API consistency and be able to switch and chain codecs easily.
Indexes.jl should be refactored such that its dependency on the codec is inverted. A package that uses Indexes should be able to declare which codec or codec chain to use. In hindsight, the Indexes package should have been called Tabix.
I'd suggest setting up a pu/* (proposed updates) branch in Indexes and track the branch in a development repository (I do this locally with GenomicFeautures v3). With Julia 1.5, mixing registries now works well. We could start a BioJuliaDevRegistry that tracks the pu branches.
Thanks for putting this together! I see the feature/CodecBGZF branch and am excited about it as well. I wonder if there is a planned timeline for it to be merged to master and released?
Dear @CiaranOMara
I haven't lost track of the existing issues, and will get around to resolving them!
However, I got sidetracked with a more fundamental issue: The BGZFStreams package is slow, and not well integrated with the rest of the Julia IO ecosystem. Kenta Sato, the author of BGZFStreams.jl realized this, and created CodecBGZF.jl three years ago, but it looks like that project has been abandoned, and Kenta is a busy guy, hard to reach these days. See also this comment from Kenta Sato
In fact, BGZFStreams is the major bottleneck of XAM.jl, to the extent that optimizing XAM does not make any practical difference because it is so bogged down by BGZFStreams. For that reason, I created CodecBZFG myself. It's faster, safer, and more generic than BGZFStreams. I'd expect XAM's BAM module to be around 2x faster with CodecBGZF, perhaps more.
It'll be a bit of a project, since XAM depends on Indexes, which also depends on BGZFStreams.
I'll make this issue here, and then hopefully get around to it the following few months. If you support this change, and feel like giving it a crack, please be my guest! It'll also be useful to have other eyes on CodecBGZF to make sure I didn't bork the interface.
The text was updated successfully, but these errors were encountered: