Use Hadoop FileSystem concat to efficiently merge sharded BAM #170

tomwhite · 2017-11-23T10:52:17Z

Hadoop FileSystem supports a concat operation for some implementations (notably HDFS) that can efficiently merge files. (In HDFS it takes advantage of variable block sizes, which avoids the need to rewrite the file.)

SAMFileMerger#mergeParts() could take advantage of this operation if the target filesystem supports it.

The text was updated successfully, but these errors were encountered:

heuermh · 2018-03-14T15:22:26Z

See
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/util/FileMerger.scala

The disableFastConcat flag is necessary because the fast file merger can fail if the underlying file system is encrypted, or any number of undocumented invariants are not met.

tomwhite added this to the 8.0.0 milestone Nov 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Hadoop FileSystem concat to efficiently merge sharded BAM #170

Use Hadoop FileSystem concat to efficiently merge sharded BAM #170

tomwhite commented Nov 23, 2017

heuermh commented Mar 14, 2018

Use Hadoop FileSystem concat to efficiently merge sharded BAM #170

Use Hadoop FileSystem concat to efficiently merge sharded BAM #170

Comments

tomwhite commented Nov 23, 2017

heuermh commented Mar 14, 2018