Distinguish between versions

PacificBiosciences · Nov 20, 2018 · 1ee7df2 · 1ee7df2
1 parent 3cba653
commit 1ee7df2
Show file tree

Hide file tree

Showing 2 changed files with 234 additions and 203 deletions.
diff --git a/README.md b/README.md
@@ -18,214 +18,15 @@ Latest version can be installed via bioconda package `isoseq3`.
 Please refer to our [official pbbioconda page](https://github.com/PacificBiosciences/pbbioconda)
 for information on Installation, Support, License, Copyright, and Disclaimer.
 
-## Overview
- - [Changelog](README.md#changelog)
- - [SMRTbell Designs](README.md#smrtbell-designs)
- - [Workflow Overview](README.md#workflow)
- - [Real-World Example](README.md#real-world-example)
- - [FAQ](README.md#faq)
+## Specific Version Documentation
+
+ * [Version 3.1](README_v3.1.md)
+ * [Version 3.0, SMRT Link 6.0](README_v3.0.md)
 
 ## Changelog
 
  * 3.0.0: Initial release, included in SMRT Link 6.0.0
 
-## SMRTbell designs
-
-PacBio supports three different SMRTbell designs for IsoSeq libraries.
-In all designs, transcripts are labelled with asymmetric primers,
-whereas a polyA tail is optional. Barcodes may be optionally added.
-
-<img width="600px" src="doc/img/isoseq3-barcoding.png"/>
-
-## Workflow
-
-<img width="1000px" src="doc/img/isoseq3-workflow.png"/>
-
-### Input
-For each cell, the `<movie>.subreads.bam` and `<movie>.subreads.bam.pbi`
-are needed for processing.
-
-### Circular Consensus Sequence calling
-Each sequencing run is processed by [*ccs*](https://github.com/PacificBiosciences/unanimity)
-to generate one representative circular consensus sequence (CCS) for each ZMW. Only ZMWs with
-at least one full pass (at least once subread with SMRT adapter on both ends) are
-used for the subsequent analysis. Polishing is not necessary
-in this step and is by default deactivated through `.
-
-    ccs movie.subreads.bam ccs.bam --noPolish --minPasses 1
-
-### Primer removal and demultiplexing
-Removal of cDNA primers and identification of barcodes (if given) is performed using [*lima*](https://github.com/pacificbiosciences/barcoding),
-which offers a specialized `--isoseq` mode.
-
-More information about how to name input primer(+barcode)
-sequences in this [FAQ](https://github.com/pacificbiosciences/barcoding#how-can-i-demultiplex-isoseq-data).
-
-    lima --isoseq --dump-clips --no-pbi ccs.bam primers.fasta demux.bam
-
-The following is the `primer.fasta` for the Clontech SMARTer cDNA library prep, which is the officially recommended protocol:
-
-    >primer_5p
-    AAGCAGTGGTATCAACGCAGAGTACATGGG
-    >primer_3p
-    GTACTCTGCGTTGATACCACTGCTT
-
-The following are examples for barcoded samples using a 16bp barcode followed by Clontech primer:
-
-    >primer_5p
-    AAGCAGTGGTATCAACGCAGAGTACATGGGG
-    >brain_3p
-    CGCACTCTGATATGTGGTACTCTGCGTTGATACCACTGCTT
-    >liver_3p
-    CTCACAGTCTGTGTGTGTACTCTGCGTTGATACCACTGCTT
-
-*lima* will remove unwanted combinations and orient sequences to 5' -> 3' orientation.
-
-From here on, execute the following steps for each output BAM file.
-
-### Clustering and polishing
-*IsoSeq3* wraps all tools into one fat binary.
-
-    $ isoseq3
-    isoseq3 - De Novo Transcript Reconstruction
-
-    Tools:
-        cluster   - Cluster CCS reads to transcripts
-        polish    - Polish the clustering output
-        summarize - Create a barcode overview CSV file
-
-    Examples:
-        isoseq3 cluster movie.consensusreadset.xml unpolished.bam
-        isoseq3 polish unpolished.bam movie.subreadset.xml polished.bam
-        isoseq3 summarize polished.bam summary.csv
-
-#### Clustering and transcript clean up
-Compared to previous IsoSeq approaches, *IsoSeq3* performs a single clustering
-technique.
-Due to the nature of the algorithm, it can't be efficiently parallelized. It is advised to give this step as many cores
-as possible. The individual steps of *cluster* are as following:
- - [Trimming](https://github.com/PacificBiosciences/trim_isoseq_polyA) of polyA tails `--require-polya`
- - Rapid concatmer [identification](https://github.com/jeffdaily/parasail) and removal
- - Clustering using hierarchical n*log(n) [alignment](https://github.com/lh3/minimap2) and iterative cluster merging
- - Unpolished [POA](https://github.com/rvaser/spoa) sequence generation
-
-##### Input
-The input file for *cluster* is one demultiplexed CCS file:
- - `<demux.ccs.bam>` or `<demux.ccs.consensusreadset.xml>`
-
-##### Output
-The following output files of *cluster* contain unpolished isoforms:
- - `<prefix>.bam`
- - `<prefix>.flnc.bam`
- - `<prefix>.fasta`
- - `<prefix>.bam.pbi` <- Only generated with `--pbi`
- - `<prefix>.transcriptset.xml` <- Only relevant for pbsmrtpipe
- - `<prefix>.consensusreadset.xml` <- Only relevant for pbsmrtpipe
-
-Example invocation:
-
-    isoseq3 cluster demux.P5--P3.bam unpolished.bam -j 32 [--split-bam 24]
-
-#### Polishing
-The algorithm behind *polish* is the *arrow* model that also used for CCS
-generation and polishing of de-novo assemblies. This step can be massively
-parallelized by splitting the `unpolished.bam` file. Split BAM files can be
-generated by *cluster*.
-
-##### Input
-The input files for *polish* are:
- - `<unpolished>.bam` or `<unpolished>.transcriptset.xml`
- - `<movie_name>.subreads.bam` or `<movie_name>.subreadset.xml`
-
-##### Output
-The following output files of *polish* contain polished isoforms:
- - `<prefix>.bam`
- - `<prefix>.bam.pbi` <- Only generated with `--pbi`
- - `<prefix>.transcriptset.xml`
- - `<prefix>.hq.fasta.gz` with predicted accuracy ≥ 0.99
- - `<prefix>.lq.fasta.gz` with predicted accuracy < 0.99
- - `<prefix>.hq.fastq.gz` with predicted accuracy ≥ 0.99
- - `<prefix>.lq.fastq.gz` with predicted accuracy < 0.99
-
-Example invocation:
-
-    isoseq3 polish unpolished.bam m54020_171110_2301211.subreads.bam polished.bam
-
-
-
-## Real-world example
-This is an example of an end-to-end cmd-line-only workflow to get from
-subreads to polished isoforms.
-
-    $ wget https://downloads.pacbcloud.com/public/dataset/RC0_1cell_2017/m54086_170204_081430.subreads.bam
-    $ wget https://downloads.pacbcloud.com/public/dataset/RC0_1cell_2017/m54086_170204_081430.subreads.bam.pbi
-
-    $ ccs --version
-    ccs 3.1.0 (commit v3.1.0)
-
-    $ time ccs m54086_170204_081430.subreads.bam m54086_170204_081430.ccs.bam \
-               --noPolish --minPasses 1
-
-    real    50m43.090s
-    user    3531m35.620s
-    sys     24m36.884s
-
-    $ cat primers.fasta
-    >primer_5p
-    AAGCAGTGGTATCAACGCAGAGTACATGGGG
-    >primer_3p
-    AAGCAGTGGTATCAACGCAGAGTAC
-
-    $ lima --version
-    lima 1.7.1 (commit v1.7.1)
-
-    $ time lima m54086_170204_081430.ccs.bam primers.fasta demux.bam \
-                --isoseq --no-pbi --dump-clips
-
-    real    0m6.543s
-    user    0m51.170s
-
-    $ ls demux*
-    demux.json  demux.lima.counts  demux.lima.report  demux.lima.summary  demux.primer_5p--primer_3p.bam  demux.primer_5p--primer_3p.subreadset.xml
-
-    $ time isoseq3 cluster demux.primer_5p--primer_3p.bam unpolished.bam --verbose
-    Read BAM                 : (200740) 8s 313ms
-    India                    : (197869) 9s 204ms
-    Save flnc file           : 35s 366ms
-    Convert to reads         : 36s 967ms
-    Sort Reads               : 69ms 756us
-    Aligning Linear          : 42s 620ms
-    Read to clusters         : 7s 506ms
-    Aligning Linear          : 37s 595ms
-    Merge by mapping         : 37s 645ms
-    Consensus                : 1m 47s
-    Merge by mapping         : 8s 861ms
-    Consensus                : 12s 633ms
-    Write output             : 3s 265ms
-    Complete run time        : 5m 12s
-
-    real    5m12.888s
-    user    58m35.243s
-
-    $ ls unpolished*
-    unpolished.bam  unpolished.bam.pbi  unpolished.cluster  unpolished.fasta  unpolished.flnc.bam  unpolished.flnc.bam.pbi  unpolished.flnc.consensusreadset.xml  unpolished.transcriptset.xml
-
-    $ time isoseq3 polish unpolished.bam m54086_170204_081430.subreads.bam polished.bam --verbose
-    14561
-
-    real    60m37.564s
-    user    2832m8.382s
-    $ ls polished*
-    polished.bam  polished.bam.pbi  polished.hq.fasta.gz  polished.hq.fastq.gz  polished.lq.fasta.gz  polished.lq.fastq.gz  polished.transcriptset.xml
-
-If you have multiple cells, you should run `--split-bam` in the cluster step which will produce chunked cluster results. Each chunked cluster result can be run as a parallel polish job and merged at the end. The following example splits into 24 chunks. `sample.subreadset.xml` is the dataset containing all the input cells. The `isoseq3 polish` jobs can be run in parallel.
-
-    $ isoseq3 cluster demux.primer_5p--primer_3p.bam unpolished.bam --split-bam 24
-    $ isoseq3 polish unpolished.0.bam sample.subreadset.xml polished.0.bam
-    $ isoseq3 polish unpolished.1.bam sample.subreadset.xml polished.1.bam
-    $ ...
-
-
 ## FAQ
 ### Why IsoSeq3 and not the established IsoSeq1 or IsoSeq2?
 The ever-increasing throughput of the Sequel system gave rise to the need for a