Skip to content

kepbod/rampage_alu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline to identify expressed Alu elements using RAMPAGE

A schematic flow shows the pipeline

workflow

Prerequisites

Softwares

Python libraries

Usage

Step 1:

Fetch proper read pairs and remove PCR dunplicates.

Usage: rm_pcr.py [options] <rampage>...

Options:
    -h --help                      Show help message.
    --version                      Show version.
    -p THREAD --thread=THREAD      Threads. [default: 5]
    -o OUTPUT --output=OUTPUT      Output directory. [default: rampage_peak]
    --min=MIN                      Minimum read counts. [default: 1]
  • Inputs: BAM files of RAMPAGE (<rampage>...)
  • Output: A output folder containing relevant files (-o OUTPUT)
    • rampage_plus_5end.bed: BED file of the 5' end of plus strand read pairs
    • rampage_plus_3read.bed: BED file of the 3' end of plus strand read pairs
    • rampage_minus_5end.bed: BED file of the 5' end of minus strand read pairs
    • rampage_minus_3read.bed: BED file of the 3' end of minus strand read pairs
    • rampage_link.bed: BED file linking the 5' and 3' ends of read pairs

Note:

  1. If there are multiple RAMPAGE BAM files (different replicates) derived from the same samples, you could simply list them afterwards.
  2. You could run with multiple threads using -p THREAD.
  3. You could set the minimum read filter for read pairs using --min=MIN.

Example:

rm_pcr.py -o rampage_peak rampage_rep1.bam rampage_rep2.bam

Step 2:

Call peaks using 5' end of RAMPAGE read pairs

Usage: call_peak.py [options] <rampagedir>

Options:
    -h --help                      Show help message.
    -v --version                   Show version.
    -l LENGTH                      Feature length for F-seq. [default: 30]
    --wig                          Create Wig files.
    -p PERCENT                     Retained percent of reads in resized peaks.
                                   [default: 0.95]
  • Input: the output folder created by rm_pcr.py
  • Output: rampage_peaks.txt under the input folder

Format of rampage_peaks.txt:

Field Description
Chrom Chromosome
Start Start of peak region
End End of peak region
Name peak
Score 0
Strand Strand of peak
Peak peak site
Height Height of peak site
Peak reads Reads of (peak site ± 2 bp)
Total Total reads of peak region
Start_Fseq Start of F-seq peak region
End_Fseq End of F-seq peak region
RPM RPM of peak region

Note:

  1. You could set feature length for F-seq peak calling using -l LENGTH.
  2. You could create wig files by setting --wig.
  3. You could run with multiple threads using -p THREAD.

Example:

call_peak.py rampage_peak

Step 3:

Calculate entropy for RAMPAGE peaks

Usage: entropy.py [options] <rampagedir>

Options:
    -h --help                      Show help message.
    --version                      Show version.
    -p THREAD --thread=THREAD      Threads. [default: 5]
  • Input: the output folder created by rm_pcr.py
  • Output: rampage_entropy.txt under the input folder

Format of rampage_entropy.txt:

The first thirteen columns of rampage_entropy.txt are the same as rampage_peaks.txt.

The additional two columns are listed below:

Field Description
Entropy Entropy of RAMPAGE peak
3' end 3' end of read pairs in peak

Note:

  1. You could run with multiple threads using -p THREAD.

Example:

entropy.py rampage_peak

Step 4:

Annotate expressed Alu elements

Usage: annotate_alu.py [options] -f ref (-a alu | -r rep) <rampagedir>

Options:
    -h --help                      Show help message.
    --version                      Show version.
    -f ref                         Gene annotations.
    -t type                        File type of gene annotations.
                                   [default: ref]
    --promoter region              Promoter region. [default: 250]
    -a alu                         Alu annotations (BED format).
    -r rep                         Repeatmasker annotations (RMSK format).
    --extend length                Alu extended length. [default: 50]
    --entropy entropy              Entropy cutoff. [default: 2.5]
    --span span                    Span cutoff. [default: 1000]
    --coverage coverage            Coverage cutoff. [default: 0.5]
    -o out                         Output file. [default: alu_peak.txt]
  • Input:
    • gene annotation file (-f ref)
    • Alu annotation file (-a alu or -r rep)
    • the output folder created by rm_pcr.py
  • Output: expressed Alu file (-o out)

Format of expressed Alu file:

The first fifteen columns are the same as rampage_entropy.txt.

The additional six columns are listed below:

Field Description
Chrom Chromosome of Alu
Start Start of Alu
End End of Alu
Name Name of Alu
Score 0
Strand Strand of Alu

Note:

  1. The the default format of gene annotation file is Gene Predictions and RefSeq Genes with Gene Names format. You could use gene annotation file in BED format by setting -t bed.
  2. If using Repeatmask annotation file, you could download them from UCSC.
  3. You could set the promoter region length using --promoter region.
  4. You could set the Alu annotation extension length using --extend length.
  5. You could set entropy cutoff using --entropy entropy.
  6. You could set RAMPAGE effective length cutoff using --span span.
  7. You could set Alu coverage cutoff using --coverage coverage.

Example:

annotate_alu.py -f ref.txt -a alu.bed -o alu_peak.txt rampage_peak

Citation

Zhang XO, Gingeras TR, Weng Z#. Genome-wide analysis of polymerase III-transcribed Alu elements suggests cell type-specific enhancer function. Genome Res. 2019, 29:1402-1414

License

Copyright (C) 2018-2019 Xiao-Ou Zhang. See the LICENSE file for license rights and limitations (MIT).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages