diff --git a/README.md b/README.md index 3d9443d..2d62b43 100644 --- a/README.md +++ b/README.md @@ -20,11 +20,18 @@ for information on Installation, Support, License, Copyright, and Disclaimer. ## Specific Version Documentation + * [Version 3.2, SMRT Link 8.0](README_v3.2.md) * [Version 3.1, SMRT Link 7.0](README_v3.1.md) * [Version 3.0, SMRT Link 6.0](README_v3.0.md) ## Changelog - * **3.1.2** + * **3.2.0** + * Add `collapse` step for aligned transcript BAM input + * Enable CCS-only workflow `cluster --use-qvs` + * Add `refine --min-polya-length` + * Add `cluster --singletons` to output unclustered FLNCs; potential sample prep artifacts! + * Fix minimap2 bugs. Outputs might change slightly. + * 3.1.2 * Reduce `polish` memory footprint * 3.1.1 * Edge case fix where `polish` would not finish and stale diff --git a/README_v3.0.md b/README_v3.0.md index 621f0d2..1992f58 100644 --- a/README_v3.0.md +++ b/README_v3.0.md @@ -37,10 +37,14 @@ Each sequencing run is processed by [*ccs*](https://github.com/PacificBioscience to generate one representative circular consensus sequence (CCS) for each ZMW. Only ZMWs with at least one full pass (at least once subread with SMRT adapter on both ends) are used for the subsequent analysis. Polishing is not necessary -in this step and is by default deactivated through `. +in this step and is by default deactivated through. ccs movie.subreads.bam ccs.bam --noPolish --minPasses 1 +For **CCS version ≥ 4.0.0** use this call: + + $ ccs movie.subreads.bam ccs.bam --skip-polish --min-passes 1 --draft-mode winpoa --disable-heuristics + ### Primer removal and demultiplexing Removal of cDNA primers and identification of barcodes (if given) is performed using [*lima*](https://github.com/pacificbiosciences/barcoding), which offers a specialized `--isoseq` mode. diff --git a/README_v3.1.md b/README_v3.1.md index c4ee074..39c2c37 100644 --- a/README_v3.1.md +++ b/README_v3.1.md @@ -59,6 +59,10 @@ used per ZMW; this can decrease run-time (only available in ccs version ≥ 3.1. $ ccs movieX.subreads.bam movieX.ccs.bam --noPolish --minPasses 1 --maxPoaCoverage 10 +For **CCS version ≥ 4.0.0** use this call: + + $ ccs movieX.subreads.bam movieX.ccs.bam --skip-polish --min-passes 1 --draft-mode winpoa --disable-heuristics + ### Step 2 - Primer removal and demultiplexing Removal of primers and identification of barcodes is performed using [*lima*](https://github.com/pacificbiosciences/barcoding), which offers a specialized `--isoseq` mode. diff --git a/README_v3.2.md b/README_v3.2.md new file mode 100644 index 0000000..57031cf --- /dev/null +++ b/README_v3.2.md @@ -0,0 +1,229 @@ +
+Scalable De Novo Isoform Discovery
+ +*** + +*IsoSeq3* contains the newest tools to identify transcripts in +PacBio single-molecule sequencing data. +Starting in SMRT Link v6.0.0, those tools power the +*IsoSeq3 GUI-based analysis* application. +A composable workflow of existing tools and algorithms, combined with +a new clustering technique, allows to process the ever-increasing yield of PacBio +machines with similar performance to *IsoSeq1* and *IsoSeq2*. + +Focus of version 3.2 documentation is processing of polished CCS reads, +the latest feature of *IsoSeq3*. Processing of unpolished CCS reads with final +transcript polishing is still supported, please refer to the +[documentation of version 3.1](README_v3.1.md). + +## Availability +Latest version can be installed via bioconda package `isoseq3`. + +Please refer to our [official pbbioconda page](https://github.com/PacificBiosciences/pbbioconda) +for information on Installation, Support, License, Copyright, and Disclaimer. + +## Overview + - Workflow Overview: [high](README_v3.1.md#high-level-workflow) / [mid](README_v3.1.md#mid-level-workflow) / [low](README_v3.1.md#low-level-workflow) level + - [Real-World Example](README_v3.1.md#real-world-example) + - [FAQ](README_v3.1.md#faq) + - [SMRTbell Designs](README_v3.1.md#what-smrtbell-designs-are-possible) + +## High-level workflow + +The high-level workflow depicts files and processes: + + + +## Mid-level workflow + +The mid-level workflow schematically explains what happens at each stage: + + + +## Low-level workflow + +The low-level workflow explained via CLI calls. All necessary dependencies are +installed via bioconda. + +### Step 0 - Input +For each SMRT cell a `movieX.subreads.bam` is needed for processing. + +### Step 1 - Circular Consensus Sequence calling +Each sequencing run is processed by [*ccs*](https://github.com/PacificBiosciences/unanimity) +to generate one representative circular consensus sequence (CCS) for each ZMW. Only ZMWs with +at least one full pass (at least one subread with SMRT adapter on both ends) are +used for the subsequent analysis. In contrast to older IsoSeq versions, +CCS polishing is required to enable skipping of the transcript polishing. +It is advised to use the latest CCS version 4.0.0 or newer. + + $ ccs movieX.subreads.bam movieX.ccs.bam --min-rq 0.9 + +More info how to [easily chunk ccs](https://github.com/PacificBiosciences/ccs#how-can-I-parallelize-on-multiple-servers). + +### Step 2 - Primer removal and demultiplexing +Removal of primers and identification of barcodes is performed using [*lima*](https://github.com/pacificbiosciences/barcoding), +which offers a specialized `--isoseq` mode. +Even in the case that your sample is not barcoded, primer removal is performed +by *lima*. +If there are more than two sequences in your `primer.fasta` file or better said +more than one pair of 5' and 3' primers, please use *lima* with `--peek-guess` +to remove spurious false positive signal. +More information about how to name input primer(+barcode) +sequences in this [FAQ](https://github.com/pacificbiosciences/barcoding#how-can-i-demultiplex-isoseq-data). + + $ lima movieX.ccs.bam barcoded_primers.fasta movieX.fl.bam --isoseq --no-pbi --peek-guess + +**Example 1:** +Following is the `primer.fasta` for the Clontech SMARTer and NEB cDNA library +prep, which are the officially recommended protocols: + + >NEB_5p + GCAATGAAGTCGCAGGGTTGGG + >Clontech_5p + AAGCAGTGGTATCAACGCAGAGTACATGGGG + >NEB_Clontech_3p + GTACTCTGCGTTGATACCACTGCTT + +**Example 2:** +Following are examples for barcoded primers using a 16bp barcode followed by +Clontech primer: + + >primer_5p + AAGCAGTGGTATCAACGCAGAGTACATGGGG + >brain_3p + CGCACTCTGATATGTGGTACTCTGCGTTGATACCACTGCTT + >liver_3p + CTCACAGTCTGTGTGTGTACTCTGCGTTGATACCACTGCTT + +*Lima* will remove unwanted combinations and orient sequences to 5' → 3' orientation. + +Output files will be called according to their primer pair. Example for +single sample libraries: + + movieX.fl.NEB_5p--NEB_Clontech_3p.bam + +If your library contains multiple samples, execute the following workflow +for each primer pair: + + movieX.fl.primer_5p--brain_3p.bam + movieX.fl.primer_5p--liver_3p.bam + +### Step 3 - Refine +Your data now contains full-length reads, but still needs to be refined by: + - [Trimming](https://github.com/PacificBiosciences/trim_isoseq_polyA) of poly(A) tails + - Rapid concatmer [identification](https://github.com/jeffdaily/parasail) and removal + +**Input** +The input file for *refine* is one demultiplexed CCS file with full-length reads +and the primer fasta file: + - `