forked from mbleda/ngscourse.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
030_example.html
58 lines (57 loc) · 3.92 KB
/
030_example.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<meta name="author" content="Variant calling" />
<title>NGS data analysis course</title>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet" href="../../../course_commons/css_template_for_examples.css" type="text/css" />
</head>
<body>
<div id="header">
<h1 class="title"><a href="http://ngscourse.github.io/">NGS data analysis course</a></h1>
<h2 class="author"><strong>Variant calling</strong></h2>
<h3 class="date"><em>(updated 28-02-2014)</em></h3>
</div>
<!-- COMMON LINKS HERE -->
<h1 id="preliminaries">Preliminaries</h1>
<h2 id="software-used-in-this-practical">Software used in this practical:</h2>
<ul>
<li><a href="http://bio-bwa.sourceforge.net/" title="BWA">BWA</a> : BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome.</li>
<li><a href="http://samtools.sourceforge.net/" title="samtools">SAMTools</a> : SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.</li>
<li><a href="http://picard.sourceforge.net/" title="Picard">Picard</a> : Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files.</li>
<li><a href="http://www.broadinstitute.org/cancer/cga/mutect_download" title="MuTect">MuTect</a> : method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.</li>
</ul>
<h2 id="file-formats-explored">File formats explored:</h2>
<ul>
<li><a href="http://samtools.sourceforge.net/SAMv1.pdf">SAM</a></li>
<li><a href="http://www.broadinstitute.org/igv/bam">BAM</a></li>
<li>VCF Variant Call Format: see [1000 Genomes][vcf-format-1000ge] and [Wikipedia][vcf-format-wikipedia] specifications.</li>
</ul>
<h1 id="some-previous-configurations">Some previous configurations</h1>
<p>Download MuTect software from http://www.broadinstitute.org/cancer/cga/mutect_download</p>
<p>You need to register before downloading. Extract the file in your desktop.</p>
<p>Just to run MuTect easier:</p>
<pre><code>echo "alias MuTect.jar='java -jar /home/Desktop/muTect-1.1.4-bin/muTect-1.1.4.jar'" >> ~/.bashrc
source ~/.bashrc</code></pre>
<h1 id="exercise-3-somatic-calling">Exercise 3: Somatic calling</h1>
<h2 id="prepare-bam-file">1. Prepare BAM file</h2>
<p>Go to the example1 folder:</p>
<pre><code>cd example3</code></pre>
<p>Add read groups:</p>
<pre><code>AddOrReplaceReadGroups.jar I=f000-normal.bam O=f010-normal_fixRG.bam RGID=group1 RGLB=lib1 RGPL=illumina RGSM=sample1 RGPU=unit1
AddOrReplaceReadGroups.jar I=f000-tumor.bam O=f010-tumor_fixRG.bam RGID=group2 RGLB=lib1 RGPL=illumina RGSM=sample1 RGPU=unit2</code></pre>
<p>Sort:</p>
<pre><code>samtools sort f010-normal_fixRG.bam f020-normal_fixRG_sorted
samtools sort f010-tumor_fixRG.bam f020-tumor_fixRG_sorted</code></pre>
<p>Index the BAM file:</p>
<pre><code>samtools index f020-normal_fixRG_sorted.bam
samtools index f020-tumor_fixRG_sorted.bam</code></pre>
<h2 id="somatic-calling">2. Somatic calling</h2>
<p>For brevity, we are not including BAM preprocessing steps. However, in real analysis it is recommended to include them.</p>
<pre><code>MuTect.jar --analysis_type MuTect --reference_sequence ../f000-reference.fa --dbsnp ../f000-dbSNP_chr21.vcf --input_file:normal f020-normal_fixRG_sorted.bam \
--input_file:tumor f020-tumor_fixRG_sorted.bam --out f030-call_stats.out --coverage_file f030-coverage.wig.txt</code></pre>
</body>
</html>