Skip to content

LUMC/fastqsplitter

Repository files navigation

fastqsplitter

A simple application to split FASTQ files.

Fastqsplitter splits a fastq file over the specified output files evenly. It is similar to the GNU Coreutils split program, except that it is aware of the FASTQ four lines per record format. (Split works with one line per record.) It has support for compressed FASTQ files and can compress splitted FASTQ files on the fly.

Fastqsplitter uses a round-robin method to distribute the FASTQ records evenly across the output files. Alternatively it can distribute files sequentially, which is useful for reading from STDIN and the input size is unknown. Fastqsplitter can split such input in N files with a given maximum size.

This application does not work with multiline fastq sequences.

Quickstart

install fastqsplitter: pip install fastqsplitter

Or with conda and an activated bioconda channel: conda install fastqsplitter

The conda install is recommended because it will install dependencies which make decompression and compression faster for .gz files.

To split an input file input_fastq.gz into 3 different files. fastqsplitter input.fastq.gz -n 3 --prefix split. This will create split.0.fastq.gz, split.1.fastq.gz and split.2.fastq.gz.

fastqsplitter uses the excellent xopen library by @marcelm. Therefore, the input and output files compression is determined by the extension. Use .gz if output files should be gzip compressed.

Documentation

More information on fastqsplitter can be found on our readthedocs page.