Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when opening a BAM file generated by HISAT2 version 2.2.0 #23

Open
yuifu opened this issue Jul 17, 2020 · 3 comments
Open

Error when opening a BAM file generated by HISAT2 version 2.2.0 #23

yuifu opened this issue Jul 17, 2020 · 3 comments

Comments

@yuifu
Copy link

yuifu commented Jul 17, 2020

I have used XAM.jl to treat BAM files. Today I came across an error when opening a BAM file generated by HISAT2 version 2.2.0. No error occurred for a BAM file generated by HISAT2 version 2.1.0 (from the same FASTQ file)

Expected Behavior

No errors are expected.

julia> using XAM
julia> reader = open(BAM.Reader, "SRR8452726_1.sort.bam")

Current Behavior

julia> using XAM
julia> reader = open(BAM.Reader, "SRR8452726_1.sort.bam")
ERROR: ArgumentError: malformed metainfo at line 68
Stacktrace:
 [1] readheader!(::TranscodingStreams.TranscodingStream{TranscodingStreams.Noop,Base.GenericIOBuffer{Array{UInt8,1}}}, ::XAM.SAM.Header, ::Tuple{Int64,Int64}) at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/sam/readrecord.jl:318
 [2] Reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/sam/reader.jl:13 [inlined]
 [3] Reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/sam/reader.jl:38 [inlined]
 [4] init_bam_reader(::BGZFStreams.BGZFStream{IOStream}) at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:96
 [5] init_bam_reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:125 [inlined]
 [6] #Reader#2 at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:36 [inlined]
 [7] Reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:31 [inlined]
 [8] #open#1 at /Users/ozakiharuka/.julia/packages/BioGenerics/cCuGr/src/IO.jl:42 [inlined]
 [9] open(::Type{XAM.BAM.Reader}, ::String) at /Users/ozakiharuka/.julia/packages/BioGenerics/cCuGr/src/IO.jl:42
 [10] top-level scope at REPL[2]:1
(v1.3) pkg> status
....
  [d759349c] XAM v0.2.3
....
julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Possible Solution / Implementation

I have no idea but the release note of HISAT2 might help.

HISAT 2.2.0 release 2/6/2020
This major version update includes a new feature to handle “repeat” reads. Based on sets of 100-bp simulated and 101-bp real reads that we tested, we found that 2.6-3.4% and 1.4-1.8% of the reads were mapped to >5 locations and >100 locations, respectively. Attempting to report all alignments would likely consume a prohibitive amount of disk space. In order to address this issue, our repeat indexing and alignment approach directly aligns reads to repeat sequences, resulting in one repeat alignment per read. HISAT2 provides application programming interfaces (API) for C++, Python, and JAVA that rapidly retrieve genomic locations from repeat alignments for use in downstream analyses.
Other minor bug fixes are also included as follows:

Fixed occasional sign (+ or -) issues of template lengths in SAM file
Fixed duplicate read alignments in SAM file
Skip a splice site if exon’s last base or first base is ambiguous (N)

Steps to Reproduce (for bugs)

  1. Download the BAM file from here.
  2. Run the following codes:
julia> using XAM
julia> reader = open(BAM.Reader, "SRR8452726_1.sort.bam")

Context

Your Environment

  • Julia Version 1.3.1
  • XAM v0.2.3
  • OS: macOS (x86_64-apple-darwin18.6.0)
  • CPU: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
  • WORD_SIZE: 64
  • LIBM: libopenlibm
  • LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
(v1.3) pkg> status
    Status `~/.julia/environments/v1.3/Project.toml`
  [c7e460c6] ArgParse v1.1.0
  [8e4a8c10] BED v0.1.0
  [00701ae9] BioAlignments v2.0.0
  [7e6ae17a] BioSequences v2.0.5
  [336ed68f] CSV v0.6.1
  [9961bab8] Cbc v0.6.7
  [aaaa29a8] Clustering v0.14.0
  [8f4d0f93] Conda v1.4.1
  [a93c6f00] DataFrames v0.20.2
  [968ba79b] DocOpt v0.4.0
  [8f5d6c58] EzXML v1.1.0
  [c2308a5c] FASTX v1.1.2
  [652a1917] Fire v0.1.0
  [587475ba] Flux v0.11.0
  [899a7d2d] GenomicFeatures v2.0.0
  [c27321d9] Glob v1.3.0
  [a2cc645c] GraphPlot v0.4.2
  [cd3eb016] HTTP v0.8.16
  [7073ff75] IJulia v1.21.2
  [682c06a0] JSON v0.21.0
  [4076af6c] JuMP v0.21.2
  [093fc24a] LightGraphs v1.3.1
  [9b87118b] PackageCompiler v1.1.1
  [91a5bcdd] Plots v1.0.10
  [d330b81b] PyPlot v2.9.0
  [777a009e] ReadCoverage v0.1.1 #master (https://github.com/bioinfo-tsukuba/ReadCoverage.jl)
  [3cdcf5f2] RecipesBase v1.0.0
  [01d81517] RecipesPipeline v0.1.3
  [2913bbd2] StatsBase v0.33.0
  [70df011a] TableReader v0.4.0
  [9c690861] TensorToolbox v1.0.1
  [d759349c] XAM v0.2.3
  [ddb6d928] YAML v0.4.0
  [8bb1440f] DelimitedFiles
  [de0858da] Printf
@CiaranOMara
Copy link
Member

Below is line 68.

@PG	ID:hisat2	PN:hisat2	VN:	CL:"/opt/conda/envs/nf-core-ramdaq-1.0dev/bin/hisat2-align-s --wrapper basic-0 --no-softclip -p 1 -x /home/username/annotations/mouse/hisat2_index_GRCm38_primary/hisat2_index_GRCm38.primary_assembly --summary-file SRR8452726_1.hisat2_summary.txt -U /tmp/28.unp"

XAM is able to parse the meta info when the empty VN tag is removed.

using XAM

line68_modified = """
@PG	ID:hisat2	PN:hisat2	CL:"/opt/conda/envs/nf-core-ramdaq-1.0dev/bin/hisat2-align-s --wrapper basic-0 --no-softclip -p 1 -x /home/username/annotations/mouse/hisat2_index_GRCm38_primary/hisat2_index_GRCm38.primary_assembly --summary-file SRR8452726_1.hisat2_summary.txt -U /tmp/28.unp"
"""

SAM.MetaInfo(chomp(line2)) #Note: chomping to remove trailing new line character.

I think the PG line generated by HISAT2 is erroneous. However, I also think that the inclusion of a tag without a value shouldn't prevent XAM from processing the rest of the file.

CiaranOMara added a commit that referenced this issue Jul 20, 2020
@CiaranOMara
Copy link
Member

@yuifu, I've put together a workaround, which I hope is sufficient to keep things moving for you.

pkg> add XAM#workaround_23

I intend to implement a more robust solution that is, in general, able to recover from line parsing errors.

@yuifu
Copy link
Author

yuifu commented Jul 21, 2020

@CiaranOMara It works! Thanks a lot!!
I also thank you for adding a solution to future release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants