Added appendix section about mapping RNA-Seq reads using STAR

mkempenaar · May 8, 2024 · 6a624fb · 6a624fb
1 parent 573c6cb
commit 6a624fb
Show file tree

Hide file tree

Showing 13 changed files with 1,161 additions and 282 deletions.
diff --git a/docs/404.html b/docs/404.html
@@ -73,6 +73,7 @@ <h1>
 <li class="book-part">Appendix</li>
 <li><a class="" href="a1-data_loading.html"><span class="header-section-number">A</span> Loading Expression Data in R</a></li>
 <li><a class="" href="a2-annotation.html"><span class="header-section-number">B</span> Annotating an RNA-Seq Experiment</a></li>
+<li><a class="" href="a3-read-mapping.html"><span class="header-section-number">C</span> From Reads to Counts</a></li>
 </ul>
 
         <div class="book-extra">
@@ -99,7 +100,7 @@ <h1>Page not found<a class="anchor" aria-label="anchor" href="#page-not-found"><
 <footer class="bg-primary text-light mt-5"><div class="container"><div class="row">
 
   <div class="col-12 col-md-6 mt-3">
-    <p>"<strong>Capstone Project - Gene Expression Analysis</strong>" was written by Marcel Kempenaar. It was last built on 2024-04-23.</p>
+    <p>"<strong>Capstone Project - Gene Expression Analysis</strong>" was written by Marcel Kempenaar. It was last built on 2024-05-08.</p>
   </div>
 
   <div class="col-12 col-md-6 mt-3">

diff --git a/docs/EDA.html b/docs/EDA.html
@@ -73,6 +73,7 @@ <h1>
 <li class="book-part">Appendix</li>
 <li><a class="" href="a1-data_loading.html"><span class="header-section-number">A</span> Loading Expression Data in R</a></li>
 <li><a class="" href="a2-annotation.html"><span class="header-section-number">B</span> Annotating an RNA-Seq Experiment</a></li>
+<li><a class="" href="a3-read-mapping.html"><span class="header-section-number">C</span> From Reads to Counts</a></li>
 </ul>
 
         <div class="book-extra">
@@ -107,7 +108,7 @@ <h2>
 <strong>Note</strong>: if the data set consists of separate files (i.e. one per sample) or for general tips on reading in data, see the <a href="a1-data_loading.html#a1-batch_data_loading">Appendix A: <em>Batch Loading Expression Data</em></a> chapter.</li>
 <li>For the remainder of the document, try to show either the contents, structure or - in this case - dimensions of relevant R objects
 <ul>
-<li>Show the first five lines of the loaded data set. Including tables in a markdown document can be done using the <code>pander</code> function from the <code>pander</code><span class="citation">(<a href="a2-annotation.html#ref-R-pander">Daróczi and Tsegelskyi 2017</a>)</span> R-library.</li>
+<li>Show the first five lines of the loaded data set. Including tables in a markdown document can be done using the <code>pander</code> function from the <code>pander</code><span class="citation">(<a href="a3-read-mapping.html#ref-R-pander">Daróczi and Tsegelskyi 2017</a>)</span> R-library.</li>
 <li>Give the dimensions (with <code><a href="https://rdrr.io/r/base/dim.html">dim()</a></code> and the structure (with <code><a href="https://rdrr.io/r/utils/str.html">str()</a></code>) of the loaded data set.</li>
 <li>Check the output of the <code>str</code> function to see if all columns are of the expected R data type (e.g. <code>values</code>, <code>factors</code>, <code>character</code>, etc.)</li>
 </ul>
@@ -308,7 +309,7 @@ <h2>
 <span class="header-section-number">3.4</span> Visualizing using <code>heatmap</code> and <code>MDS</code><a class="anchor" aria-label="anchor" href="#EDA_part2"><i class="fas fa-link"></i></a>
 </h2>
 <p>This section adds a few Exploratory Data Analysis techniques where we will measure and look at <em>distances</em> between samples based on <em>normalized</em> data. Measuring distances between two data objects (samples in our case) is a common task in cluster analysis to compare similarity (low distance indicates similar data). In this case we will calculate the distances between our samples and visualize them in a <code>heatmap</code> and using a <code>multidimensional scaling</code> (MDS) technique.</p>
-<p>In the previous section we used the raw count data. You might have one or more samples that have different values (i.e. shifted) compared to other samples. While we need the raw count data to use R packages such as <code>edgeR</code> <span class="citation">(<a href="a2-annotation.html#ref-R-edgeR">Chen et al. 2018</a>)</span> and <code>DESeq2</code> <span class="citation">(<a href="a2-annotation.html#ref-R-DESeq2">Love, Anders, and Huber 2017</a>)</span>, calculating sample distances (used in the visualizations in this section) should be done on some form of normalized data. This data can either be RPKM/FPKM/TPM/CPM or vst-transformed (raw-)read counts. A proper method of transforming raw read count data is using the <code>vst</code> method from the <code>DESeq2</code> R Bioconductor library which is shown below. This ‘<em>variance stabilizing transformation</em>’ normalized data will only be used in this chapter, in chapter 4 we will again normalize using a different technique.</p>
+<p>In the previous section we used the raw count data. You might have one or more samples that have different values (i.e. shifted) compared to other samples. While we need the raw count data to use R packages such as <code>edgeR</code> <span class="citation">(<a href="a3-read-mapping.html#ref-R-edgeR">Chen et al. 2018</a>)</span> and <code>DESeq2</code> <span class="citation">(<a href="a3-read-mapping.html#ref-R-DESeq2">Love, Anders, and Huber 2017</a>)</span>, calculating sample distances (used in the visualizations in this section) should be done on some form of normalized data. This data can either be RPKM/FPKM/TPM/CPM or vst-transformed (raw-)read counts. A proper method of transforming raw read count data is using the <code>vst</code> method from the <code>DESeq2</code> R Bioconductor library which is shown below. This ‘<em>variance stabilizing transformation</em>’ normalized data will only be used in this chapter, in chapter 4 we will again normalize using a different technique.</p>
 <p>The following code examples shows how to use this library to normalize the count data from the <strong>GSE101942</strong> experiment to vst-normalized data before we calculate a distance metric.</p>
 <ul>
 <li>new(“standardGeneric”, .Data = function (object, …)</li>
@@ -714,7 +715,7 @@ <h2>
 <footer class="bg-primary text-light mt-5"><div class="container"><div class="row">
 
   <div class="col-12 col-md-6 mt-3">
-    <p>"<strong>Capstone Project - Gene Expression Analysis</strong>" was written by Marcel Kempenaar. It was last built on 2024-04-23.</p>
+    <p>"<strong>Capstone Project - Gene Expression Analysis</strong>" was written by Marcel Kempenaar. It was last built on 2024-05-08.</p>
   </div>
 
   <div class="col-12 col-md-6 mt-3">

diff --git a/docs/a1-data_loading.html b/docs/a1-data_loading.html
@@ -73,6 +73,7 @@ <h1>
 <li class="book-part">Appendix</li>
 <li><a class="active" href="a1-data_loading.html"><span class="header-section-number">A</span> Loading Expression Data in R</a></li>
 <li><a class="" href="a2-annotation.html"><span class="header-section-number">B</span> Annotating an RNA-Seq Experiment</a></li>
+<li><a class="" href="a3-read-mapping.html"><span class="header-section-number">C</span> From Reads to Counts</a></li>
 </ul>
 
         <div class="book-extra">
@@ -594,7 +595,7 @@ <h3>
 <footer class="bg-primary text-light mt-5"><div class="container"><div class="row">
 
   <div class="col-12 col-md-6 mt-3">
-    <p>"<strong>Capstone Project - Gene Expression Analysis</strong>" was written by Marcel Kempenaar. It was last built on 2024-04-23.</p>
+    <p>"<strong>Capstone Project - Gene Expression Analysis</strong>" was written by Marcel Kempenaar. It was last built on 2024-05-08.</p>
   </div>
 
   <div class="col-12 col-md-6 mt-3">

diff --git a/docs/a2-annotation.html b/docs/a2-annotation.html
@@ -73,6 +73,7 @@ <h1>
 <li class="book-part">Appendix</li>
 <li><a class="" href="a1-data_loading.html"><span class="header-section-number">A</span> Loading Expression Data in R</a></li>
 <li><a class="active" href="a2-annotation.html"><span class="header-section-number">B</span> Annotating an RNA-Seq Experiment</a></li>
+<li><a class="" href="a3-read-mapping.html"><span class="header-section-number">C</span> From Reads to Counts</a></li>
 </ul>
 
         <div class="book-extra">
@@ -848,32 +849,13 @@ <h3>
 <span><span class="va">results</span><span class="op">$</span><span class="va">gene_length</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/MathFun.html">abs</a></span><span class="op">(</span><span class="va">results</span><span class="op">$</span><span class="va">end_position</span> <span class="op">-</span> <span class="va">results</span><span class="op">$</span><span class="va">start_position</span><span class="op">)</span></span></code></pre></div>
 <p>The <code>results</code> object is a <code>data.frame</code> with 5 columns that we can merge with our data set giving us the following annotation columns (combined from the <code>AnnotationDBI</code> and <code>biomaRt</code> libraries).</p>
 <p>This was just an example on how to use the <code>biomaRt</code> library and it comes down to selecting the correct filter and looking for interesting attributes to retrieve. Further information can be found in the documentation avaialble with `vignette(‘biomaRt’)</p>
-</div>
-</div>
-<div id="references" class="section level2" number="7.3">
-<h2>
-<span class="header-section-number">B.3</span> References<a class="anchor" aria-label="anchor" href="#references"><i class="fas fa-link"></i></a>
-</h2>
 
-<div id="refs" class="references csl-bib-body hanging-indent">
-<div id="ref-deelen15" class="csl-entry">
-al., Patrick Deelen et. 2015. <span>“Calling Genotypes from Public RNA-Sequencing Data Enables Identification of Genetic Variants That Affect Gene-Expression Levels.”</span> <em>Genome Medicine</em> 7 (30).
-</div>
-<div id="ref-R-edgeR" class="csl-entry">
-Chen, Yunshun, Aaron Lun, Davis McCarthy, Xiaobei Zhou, Mark Robinson, and Gordon Smyth. 2018. <em>edgeR: Empirical Analysis of Digital Gene Expression Data in r</em>. <a href="http://bioinf.wehi.edu.au/edgeR">http://bioinf.wehi.edu.au/edgeR</a>.
-</div>
-<div id="ref-R-pander" class="csl-entry">
-Daróczi, Gergely, and Roman Tsegelskyi. 2017. <em>Pander: An r ’Pandoc’ Writer</em>. <a href="https://CRAN.R-project.org/package=pander">https://CRAN.R-project.org/package=pander</a>.
-</div>
-<div id="ref-R-DESeq2" class="csl-entry">
-Love, Michael, Simon Anders, and Wolfgang Huber. 2017. <em>DESeq2: Differential Gene Expression Analysis Based on the Negative Binomial Distribution</em>. <a href="https://github.com/mikelove/DESeq2">https://github.com/mikelove/DESeq2</a>.
-</div>
 </div>
 </div>
 </div>
   <div class="chapter-nav">
 <div class="prev"><a href="a1-data_loading.html"><span class="header-section-number">A</span> Loading Expression Data in R</a></div>
-<div class="empty"></div>
+<div class="next"><a href="a3-read-mapping.html"><span class="header-section-number">C</span> From Reads to Counts</a></div>
 </div></main><div class="col-md-3 col-lg-2 d-none d-md-block sidebar sidebar-chapter">
     <nav id="toc" data-toggle="toc" aria-label="On this page"><h2>On this page</h2>
       <ul class="nav navbar-nav">
@@ -885,7 +867,6 @@ <h2>
 <li><a class="nav-link" href="#using-biomart"><span class="header-section-number">B.2.2</span> Using biomaRt</a></li>
 </ul>
 </li>
-<li><a class="nav-link" href="#references"><span class="header-section-number">B.3</span> References</a></li>
 </ul>
 
       <div class="book-extra">
@@ -903,7 +884,7 @@ <h2>
 <footer class="bg-primary text-light mt-5"><div class="container"><div class="row">
 
   <div class="col-12 col-md-6 mt-3">
-    <p>"<strong>Capstone Project - Gene Expression Analysis</strong>" was written by Marcel Kempenaar. It was last built on 2024-04-23.</p>
+    <p>"<strong>Capstone Project - Gene Expression Analysis</strong>" was written by Marcel Kempenaar. It was last built on 2024-05-08.</p>
   </div>
 
   <div class="col-12 col-md-6 mt-3">