You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add new interfaces for manipulating auxilary BAM fields
I've been working a bit with BAM files using BioAlignments the past days and have been looking through the source code. During the work, there was a few function I found that I missed from BioAlignments which would have my work easier. They would all be quite easy to implement, and I'd love to pitch a PR with the new interfaces + tests, but being interfaces, they're hard to change once released and very annoying to get wrong. So, let's discuss them first!
Phase out AuxData, or make it view-based.
Currently, an AuxData object can be created from the auxiliary data of a BAM record. In the current implementation It necessitates a copy of the data from the record, meaning that the user can manipulate the AuxData and seemingly succeeding, but without the record actually changing. Furthermore, most interfaces (all except iterate, AFAIK) for aux fields is currently implemented to work on Vector{UInt8} or on the records directly, making the AuxData type mostly obsolete.
I suggest removing the object alltogether to avoid code duplication of functions acting on AuxData and Records and confusion about what the user is mutating. If this is not acceptable, then making the AuxData view-based.
Records work sort of like AbstractDict. In fact, the implemented AuxData type is a subtype of AbstractDict. But asking for a value with a default is very awkward for the user. I suggest adding this method, which is to completely mirror how it works for Dicts.
delete!(record::BAM.Record, key::AbstractString)
Actually changing the auxiliary fields of BAM.Record objects is difficult. Again, this one should simply mirror the similar method for Dicts. One could also think about adding pop!.
This would also be useful to have. The value would be encoded and appended to the end of the data array. Probably a few different methods would have to be written, since the type of Value should only be an UInt8, Int32, Float32, String, or Vector{T} where T can be UInt8, Int8, UInt16, Int16, UInt32, Int32 or Float32.
Some new interface for making headers more easy
People can make all kinds of custom headers, but usually they just want the @SQ headers updated plus whatever non-SQ headers the program created. It's worth noting that BioAlignments make no checks for the header and records matching together, gladly writing a healthy-looking but unparsable (by htslib) BAM.file, so making it easy to not mess up the headers is rather important. I suggest making it easy to update the SQ headers. I'd like any suggestion here.
I can imagine two new constructors for SAM.Header, namely SAM.Header(h::Sam.Header, records::Vector{BAM.Record}) and SAM.Header(records::Vector{BAM.Record}), the latter creating a header with the appropriate "SQ" headers, the former with the same, but keeping any non-SQ tags already present.
The text was updated successfully, but these errors were encountered:
I think if there's a reasonable case to be made, we don't need to go on supporting LTS with future versions.
If there's someone that comes saying "I absolutely need LTS", they can make the case, but I don't think it's worth accommodating that hypothetical user if there are concrete benefits to actual users right now.
Add new interfaces for manipulating auxilary BAM fields
I've been working a bit with BAM files using BioAlignments the past days and have been looking through the source code. During the work, there was a few function I found that I missed from BioAlignments which would have my work easier. They would all be quite easy to implement, and I'd love to pitch a PR with the new interfaces + tests, but being interfaces, they're hard to change once released and very annoying to get wrong. So, let's discuss them first!
Phase out AuxData, or make it view-based.
Currently, an
AuxData
object can be created from the auxiliary data of a BAM record. In the current implementation It necessitates a copy of the data from the record, meaning that the user can manipulate the AuxData and seemingly succeeding, but without the record actually changing. Furthermore, most interfaces (all exceptiterate
, AFAIK) for aux fields is currently implemented to work onVector{UInt8}
or on the records directly, making the AuxData type mostly obsolete.I suggest removing the object alltogether to avoid code duplication of functions acting on AuxData and Records and confusion about what the user is mutating. If this is not acceptable, then making the AuxData view-based.
get(record::BAM.Record, key::AbstractString, default)
Records work sort of like
AbstractDict
. In fact, the implementedAuxData
type is a subtype of AbstractDict. But asking for a value with a default is very awkward for the user. I suggest adding this method, which is to completely mirror how it works for Dicts.delete!(record::BAM.Record, key::AbstractString)
Actually changing the auxiliary fields of BAM.Record objects is difficult. Again, this one should simply mirror the similar method for
Dict
s. One could also think about addingpop!
.setindex!(record::BAM.Record, key::AbstractString, value)
This would also be useful to have. The value would be encoded and appended to the end of the data array. Probably a few different methods would have to be written, since the type of Value should only be an UInt8, Int32, Float32, String, or Vector{T} where T can be UInt8, Int8, UInt16, Int16, UInt32, Int32 or Float32.
Some new interface for making headers more easy
People can make all kinds of custom headers, but usually they just want the
@SQ
headers updated plus whatever non-SQ headers the program created. It's worth noting that BioAlignments make no checks for the header and records matching together, gladly writing a healthy-looking but unparsable (by htslib) BAM.file, so making it easy to not mess up the headers is rather important. I suggest making it easy to update the SQ headers. I'd like any suggestion here.I can imagine two new constructors for SAM.Header, namely
SAM.Header(h::Sam.Header, records::Vector{BAM.Record})
andSAM.Header(records::Vector{BAM.Record})
, the latter creating a header with the appropriate "SQ" headers, the former with the same, but keeping any non-SQ tags already present.The text was updated successfully, but these errors were encountered: