Error when opening a BAM file generated by HISAT2 version 2.2.0 #23

yuifu · 2020-07-17T10:51:45Z

I have used XAM.jl to treat BAM files. Today I came across an error when opening a BAM file generated by HISAT2 version 2.2.0. No error occurred for a BAM file generated by HISAT2 version 2.1.0 (from the same FASTQ file)

Expected Behavior

No errors are expected.

julia> using XAM
julia> reader = open(BAM.Reader, "SRR8452726_1.sort.bam")

Current Behavior

julia> using XAM
julia> reader = open(BAM.Reader, "SRR8452726_1.sort.bam")
ERROR: ArgumentError: malformed metainfo at line 68
Stacktrace:
 [1] readheader!(::TranscodingStreams.TranscodingStream{TranscodingStreams.Noop,Base.GenericIOBuffer{Array{UInt8,1}}}, ::XAM.SAM.Header, ::Tuple{Int64,Int64}) at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/sam/readrecord.jl:318
 [2] Reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/sam/reader.jl:13 [inlined]
 [3] Reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/sam/reader.jl:38 [inlined]
 [4] init_bam_reader(::BGZFStreams.BGZFStream{IOStream}) at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:96
 [5] init_bam_reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:125 [inlined]
 [6] #Reader#2 at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:36 [inlined]
 [7] Reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:31 [inlined]
 [8] #open#1 at /Users/ozakiharuka/.julia/packages/BioGenerics/cCuGr/src/IO.jl:42 [inlined]
 [9] open(::Type{XAM.BAM.Reader}, ::String) at /Users/ozakiharuka/.julia/packages/BioGenerics/cCuGr/src/IO.jl:42
 [10] top-level scope at REPL[2]:1
(v1.3) pkg> status
....
  [d759349c] XAM v0.2.3
....
julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Possible Solution / Implementation

I have no idea but the release note of HISAT2 might help.

HISAT 2.2.0 release 2/6/2020
This major version update includes a new feature to handle “repeat” reads. Based on sets of 100-bp simulated and 101-bp real reads that we tested, we found that 2.6-3.4% and 1.4-1.8% of the reads were mapped to >5 locations and >100 locations, respectively. Attempting to report all alignments would likely consume a prohibitive amount of disk space. In order to address this issue, our repeat indexing and alignment approach directly aligns reads to repeat sequences, resulting in one repeat alignment per read. HISAT2 provides application programming interfaces (API) for C++, Python, and JAVA that rapidly retrieve genomic locations from repeat alignments for use in downstream analyses.
Other minor bug fixes are also included as follows:

Fixed occasional sign (+ or -) issues of template lengths in SAM file
Fixed duplicate read alignments in SAM file
Skip a splice site if exon’s last base or first base is ambiguous (N)

Steps to Reproduce (for bugs)

Download the BAM file from here.
Run the following codes:

julia> using XAM
julia> reader = open(BAM.Reader, "SRR8452726_1.sort.bam")

Context

Your Environment

Julia Version 1.3.1
XAM v0.2.3
OS: macOS (x86_64-apple-darwin18.6.0)
CPU: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

(v1.3) pkg> status
    Status `~/.julia/environments/v1.3/Project.toml`
  [c7e460c6] ArgParse v1.1.0
  [8e4a8c10] BED v0.1.0
  [00701ae9] BioAlignments v2.0.0
  [7e6ae17a] BioSequences v2.0.5
  [336ed68f] CSV v0.6.1
  [9961bab8] Cbc v0.6.7
  [aaaa29a8] Clustering v0.14.0
  [8f4d0f93] Conda v1.4.1
  [a93c6f00] DataFrames v0.20.2
  [968ba79b] DocOpt v0.4.0
  [8f5d6c58] EzXML v1.1.0
  [c2308a5c] FASTX v1.1.2
  [652a1917] Fire v0.1.0
  [587475ba] Flux v0.11.0
  [899a7d2d] GenomicFeatures v2.0.0
  [c27321d9] Glob v1.3.0
  [a2cc645c] GraphPlot v0.4.2
  [cd3eb016] HTTP v0.8.16
  [7073ff75] IJulia v1.21.2
  [682c06a0] JSON v0.21.0
  [4076af6c] JuMP v0.21.2
  [093fc24a] LightGraphs v1.3.1
  [9b87118b] PackageCompiler v1.1.1
  [91a5bcdd] Plots v1.0.10
  [d330b81b] PyPlot v2.9.0
  [777a009e] ReadCoverage v0.1.1 #master (https://github.com/bioinfo-tsukuba/ReadCoverage.jl)
  [3cdcf5f2] RecipesBase v1.0.0
  [01d81517] RecipesPipeline v0.1.3
  [2913bbd2] StatsBase v0.33.0
  [70df011a] TableReader v0.4.0
  [9c690861] TensorToolbox v1.0.1
  [d759349c] XAM v0.2.3
  [ddb6d928] YAML v0.4.0
  [8bb1440f] DelimitedFiles
  [de0858da] Printf

The text was updated successfully, but these errors were encountered:

CiaranOMara · 2020-07-20T01:37:21Z

Below is line 68.

@PG	ID:hisat2	PN:hisat2	VN:	CL:"/opt/conda/envs/nf-core-ramdaq-1.0dev/bin/hisat2-align-s --wrapper basic-0 --no-softclip -p 1 -x /home/username/annotations/mouse/hisat2_index_GRCm38_primary/hisat2_index_GRCm38.primary_assembly --summary-file SRR8452726_1.hisat2_summary.txt -U /tmp/28.unp"

XAM is able to parse the meta info when the empty VN tag is removed.

using XAM

line68_modified = """
@PG	ID:hisat2	PN:hisat2	CL:"/opt/conda/envs/nf-core-ramdaq-1.0dev/bin/hisat2-align-s --wrapper basic-0 --no-softclip -p 1 -x /home/username/annotations/mouse/hisat2_index_GRCm38_primary/hisat2_index_GRCm38.primary_assembly --summary-file SRR8452726_1.hisat2_summary.txt -U /tmp/28.unp"
"""

SAM.MetaInfo(chomp(line2)) #Note: chomping to remove trailing new line character.

I think the PG line generated by HISAT2 is erroneous. However, I also think that the inclusion of a tag without a value shouldn't prevent XAM from processing the rest of the file.

CiaranOMara · 2020-07-20T03:35:00Z

@yuifu, I've put together a workaround, which I hope is sufficient to keep things moving for you.

pkg> add XAM#workaround_23

I intend to implement a more robust solution that is, in general, able to recover from line parsing errors.

yuifu · 2020-07-21T13:02:21Z

@CiaranOMara It works! Thanks a lot!!
I also thank you for adding a solution to future release.

CiaranOMara added a commit that referenced this issue Jul 20, 2020

Workaround for #23

da0c188

jakobnissen mentioned this issue Jul 31, 2020

Various spec violations #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when opening a BAM file generated by HISAT2 version 2.2.0 #23

Error when opening a BAM file generated by HISAT2 version 2.2.0 #23

yuifu commented Jul 17, 2020

CiaranOMara commented Jul 20, 2020

CiaranOMara commented Jul 20, 2020

yuifu commented Jul 21, 2020

Error when opening a BAM file generated by HISAT2 version 2.2.0 #23

Error when opening a BAM file generated by HISAT2 version 2.2.0 #23

Comments

yuifu commented Jul 17, 2020

Expected Behavior

Current Behavior

Possible Solution / Implementation

Steps to Reproduce (for bugs)

Context

Your Environment

CiaranOMara commented Jul 20, 2020

CiaranOMara commented Jul 20, 2020

yuifu commented Jul 21, 2020