Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Hadoop FileSystem concat to efficiently merge sharded BAM #170

Open
tomwhite opened this issue Nov 23, 2017 · 1 comment
Open

Use Hadoop FileSystem concat to efficiently merge sharded BAM #170

tomwhite opened this issue Nov 23, 2017 · 1 comment
Milestone

Comments

@tomwhite
Copy link
Member

Hadoop FileSystem supports a concat operation for some implementations (notably HDFS) that can efficiently merge files. (In HDFS it takes advantage of variable block sizes, which avoids the need to rewrite the file.)

SAMFileMerger#mergeParts() could take advantage of this operation if the target filesystem supports it.

@tomwhite tomwhite added this to the 8.0.0 milestone Nov 23, 2017
@heuermh
Copy link
Contributor

heuermh commented Mar 14, 2018

See
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/util/FileMerger.scala

The disableFastConcat flag is necessary because the fast file merger can fail if the underlying file system is encrypted, or any number of undocumented invariants are not met.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants