Skip to content

Commit

Permalink
Version 3.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
armintoepfer committed Feb 18, 2020
1 parent 5d5c0f2 commit 7a35d68
Show file tree
Hide file tree
Showing 16 changed files with 96 additions and 586 deletions.
34 changes: 20 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<h1 align="center"><img width="300px" src="doc/img/isoseq3.png"/></h1>
<h1 align="center"><img width="300px" src="doc/img/isoseq.png"/></h1>
<h1 align="center">IsoSeq v3</h1>
<p align="center">Scalable De Novo Isoform Discovery</p>

Expand All @@ -18,14 +18,15 @@ Latest version can be installed via bioconda package `isoseq3`.
Please refer to our [official pbbioconda page](https://github.com/PacificBiosciences/pbbioconda)
for information on Installation, Support, License, Copyright, and Disclaimer.

## Specific Version Documentation
## Workflow Documentation

* [Version 3.2, SMRT Link 8.0](README_v3.2.md)
* [Version 3.1, SMRT Link 7.0](README_v3.1.md)
* [Version 3.0, SMRT Link 6.0](README_v3.0.md)
* [Iso-Seq Clustering](isoseq-clustering.md)
* Iso-Seq Deduplication (UMIs and cell barcodes) [Future release]

## Changelog
* **3.2.2**
* **3.3.0**
* SMRT Link release 9.0.0
* 3.2.2
* Fix `polish` not generating fasta/q output. This bug was introduced in v3.2.0
* 3.2.1
* Fix a gff index 1-off bug in `collapse`
Expand All @@ -49,6 +50,12 @@ called `refine`. Your custom `primers.fasta` is used in this step to detect
concatemers.

## FAQ
### Where is the workflow starting from unpolished CCS reads?
To simplify, unify, and future proof Iso-Seq, we decided to remove documentation
starting from unpolished CCS reads. With the ever-increasing polymerase read
lengths and improvements of CCS, going forward, it is recommended to generate
polished CCS reads first and thus make final transcript polishing optional.

### Why IsoSeq v3 and not the established versions 1 or 2?
The ever-increasing throughput of the Sequel system gave rise to the need for a
scalable software solution that can handle millions of CCS reads, while
Expand All @@ -57,11 +64,11 @@ maintaining sensitivity and accuracy. Internal benchmarks have shown that
[SQANTI](https://bitbucket.org/ConesaLab/sqanti) attributes *IsoSeq v3* a higher
number of perfectly annotated isoforms:

<img width="1000px" src="doc/img/isoseq3-performance.png"/>
<img width="1000px" src="doc/img/isoseq-performance.png"/>

Additional benefit, single linux binary that requires no dependencies.

### Why is the number of transcripts much lower with IsoSeq3?
### Why is the number of transcripts much lower with IsoSeq v3?
Even though we also observe fewer polished transcripts with *IsoSeq v3*, the
overall quality is much higher. Most of the low-quality transcripts are lost in the
demultiplexing step. *Isoseq v1/2 classify* is too relaxed and is not filtering
Expand All @@ -70,18 +77,17 @@ effectively removes most molecules that are wrongly tagged, as in two 5' or two
3' primers. Only a proper 5' and 3' primer pair allows to identify a full-length
transcript and its orientation.


### I can't find the *classify* step
Starting with version 3.1, *classify* functionality has been split into two tools.
Removal of (barcoded) primers is performed with PacBio's standard demultiplexing
tool *lima*. *Lima* does not remove poly(A) tails, nor detects concatemers.
For this, `isoseq3 refine` generates FLNC reads.
For this, `isoseq refine` generates FLNC reads.

For version 3.0, poly(A) tail removal and concatemer detection is performed in
`isoseq3 cluster`
`isoseq cluster`

### My sample has poly(A) tails, how can I remove them?
Use `--require-polya` for `isoseq3 refine`.
Use `--require-polya` for `isoseq refine`.
This filters for FL reads that have a poly(A) tail
with at least 20 base pairs and removes identified tail.

Expand All @@ -107,7 +113,7 @@ feasible.
*IsoSeq v3* deems two reads to stem from the same transcript, if they meet
following criteria:

<img width="1000px" src="doc/img/isoseq3-similar-transcripts.png"/>
<img width="1000px" src="doc/img/isoseq-similar-transcripts.png"/>

There is no upper limit on the number of gaps.

Expand All @@ -128,7 +134,7 @@ PacBio supports three different SMRTbell designs for IsoSeq libraries.
In all designs, transcripts are labelled with asymmetric primers,
whereas a poly(A) tail is optional. Barcodes may be optionally added.

<img width="600px" src="doc/img/isoseq3-barcoding.png"/>
<img width="600px" src="doc/img/isoseq-barcoding.png"/>

### The binary does not work on my linux system!
Binaries require **SSE4.1 CPU support**; CPUs after 2008 (Penryn) include it.
Expand Down
222 changes: 0 additions & 222 deletions README_v3.0.md

This file was deleted.

Loading

0 comments on commit 7a35d68

Please sign in to comment.