You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Single-cell/nucleus datasets typically contain matched FASTQ files, in groups of 2 for RNA-seq and some ATAC-seq assays, and 3 for other ATAC-seq data types. (RNA-seq contains 2 more files per group, with prefix I, which are not currently used in the analysis).
Processing of these datatypes requires the groups of FASTQ files to match, in that (e.g.,) "the first read in R1 is the barcode + UMI, the first read in R2 is the matched transcript sequence", with "zipped" iteration over the reads in each file.
This crucially requires the number of reads (and therefore lines) in each of the grouped FASTQ files to match. Check this during dataset ingest -- we already check for valid gzip compression, and we should implement the check proposed here so that it doesn't waste CPU and I/O time decompressing the same file twice.
The text was updated successfully, but these errors were encountered:
Single-cell/nucleus datasets typically contain matched FASTQ files, in groups of 2 for RNA-seq and some ATAC-seq assays, and 3 for other ATAC-seq data types. (RNA-seq contains 2 more files per group, with prefix I, which are not currently used in the analysis).
Processing of these datatypes requires the groups of FASTQ files to match, in that (e.g.,) "the first read in R1 is the barcode + UMI, the first read in R2 is the matched transcript sequence", with "zipped" iteration over the reads in each file.
This crucially requires the number of reads (and therefore lines) in each of the grouped FASTQ files to match. Check this during dataset ingest -- we already check for valid
gzip
compression, and we should implement the check proposed here so that it doesn't waste CPU and I/O time decompressing the same file twice.The text was updated successfully, but these errors were encountered: