03-prerequisites.Rmd

# Prerequisites {#prerequisites}

The analysis presented in this book requires a basic understanding of the 
`R` programing language. An introduction to `R` can be found [here](https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf) and
in the book [R for Data Science](https://r4ds.hadley.nz/).

Furthermore, it is beneficial to be familiar with single-cell data analysis
using the [Bioconductor](https://www.bioconductor.org/) framework. The 
[Orchestrating Single-Cell Analysis with Bioconductor](https://bioconductor.org/books/release/OSCA/) book
gives an excellent overview on data containers and basic analysis that are being
used here.

An overview on IMC as technology and necessary image processing steps can be
found on the [IMC workflow website](https://bodenmillergroup.github.io/IMCWorkflow/). 

Before we get started on IMC data analysis, we will need to make sure that
software dependencies are installed and the example data is downloaded.

## Obtain the code

This book provides R code to perform single-cell and spatial data analysis.
You can copy the individual code chunks into your R scripts or you can obtain
the full code of the book via:

```
git clone https://github.com/BodenmillerGroup/IMCDataAnalysis.git
```

## Software requirements

The R packages needed to execute the presented workflow can either be manually
installed (see section \@ref(manual-install)) or are available within a provided
Docker container (see section \@ref(docker)). The Docker option is useful if you
want to exactly reproduce the presented analysis across operating systems;
however, the manual install gives you more flexibility for exploratory data
analysis.

### Using Docker {#docker}

For reproducibility purposes, we provide a Docker container [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/pkgs/container/imcdataanalysis).

1. After installing [Docker](https://docs.docker.com/get-docker/) you can first pull the container via:

```
docker pull ghcr.io/bodenmillergroup/imcdataanalysis:latest
```

and then run the container:

```
docker run -v /path/to/IMCDataAnalysis:/home/rstudio/IMCDataAnalysis \
	-e PASSWORD=bioc -p 8787:8787  \
	ghcr.io/bodenmillergroup/imcdataanalysis:latest
```

Here, the `/path/to/` needs to be adjusted to where you keep the code and data
of the book.

**Of note: it is recommended to use a date-tagged version of the container to ensure reproducibility**. 
This can be done via:

```
docker pull ghcr.io/bodenmillergroup/imcdataanalysis:<year-month-date>
```

2. An RStudio server session can be accessed via a browser at `localhost:8787` using `Username: rstudio` and `Password: bioc`.  
3. Navigate to `IMCDataAnalysis` and open the `IMCDataAnalysis.Rproj` file.  
4. Code in the individual files can now be executed or the whole workflow can be build by entering `bookdown::render_book()`.

### Manual installation {#manual-install}

The following section describes how to manually install all needed R packages
when not using the provided Docker container.
To install all R packages needed for the analysis, please run:

```{r install-packages, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("rmarkdown", "bookdown", "pheatmap", "viridis", "zoo", 
                       "devtools", "testthat", "tiff", "distill", "ggrepel", 
                       "patchwork", "mclust", "RColorBrewer", "uwot", "Rtsne", 
                       "harmony", "Seurat", "SeuratObject", "cowplot", "kohonen", 
                       "caret", "randomForest", "ggridges", "cowplot", 
                       "gridGraphics", "scales", "tiff", "harmony", "Matrix", 
                       "CATALYST", "scuttle", "scater", "dittoSeq", 
                       "tidyverse", "BiocStyle", "batchelor", "bluster", "scran", 
                       "lisaClust", "spicyR", "iSEE", "imcRtools", "cytomapper",
                       "imcdatasets", "cytoviewer"))

# Github dependencies
devtools::install_github("i-cyto/Rphenograph")
```

```{r load-libraries, echo = FALSE, message = FALSE}
options(timeout=10000)
library(CATALYST)
library(SpatialExperiment)
library(SingleCellExperiment)
library(scuttle)
library(scater)
library(imcRtools)
library(cytomapper)
library(dittoSeq)
library(tidyverse)
library(bluster)
library(scran)
library(lisaClust)
library(caret)
library(cytoviewer)
```

### Major package versions

Throughout the analysis, we rely on different R software packages.
This section lists the most commonly used packages in this workflow.

Data containers:

* [SpatialExperiment](https://bioconductor.org/packages/release/bioc/html/SpatialExperiment.html) version `r packageVersion("SpatialExperiment")`
* [SingleCellExperiment](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) version `r packageVersion("SingleCellExperiment")`

Data analysis:

* [CATALYST](https://bioconductor.org/packages/release/bioc/html/CATALYST.html) version `r packageVersion("CATALYST")`
* [imcRtools](https://bioconductor.org/packages/release/bioc/html/imcRtools.html) version `r packageVersion("imcRtools")`
* [scuttle](https://bioconductor.org/packages/release/bioc/html/scuttle.html) version `r packageVersion("scuttle")`
* [scater](https://bioconductor.org/packages/release/bioc/html/scater.html) version `r packageVersion("scater")`
* [batchelor](https://www.bioconductor.org/packages/release/bioc/html/batchelor.html) version `r packageVersion("batchelor")`
* [bluster](https://www.bioconductor.org/packages/release/bioc/html/bluster.html) version `r packageVersion("bluster")`
* [scran](https://www.bioconductor.org/packages/release/bioc/html/scran.html) version `r packageVersion("scran")`
* [harmony](https://github.com/immunogenomics/harmony) version `r packageVersion("harmony")`
* [Seurat](https://satijalab.org/seurat/index.html) version `r packageVersion("Seurat")`
* [lisaClust](https://www.bioconductor.org/packages/release/bioc/html/lisaClust.html) version `r packageVersion("lisaClust")`
* [caret](https://topepo.github.io/caret/) version `r packageVersion("caret")`

Data visualization:

* [cytomapper](https://bioconductor.org/packages/release/bioc/html/cytomapper.html) version `r packageVersion("cytomapper")`
* [cytoviewer](https://bioconductor.org/packages/release/bioc/html/cytoviewer.html) version `r packageVersion("cytoviewer")`
* [dittoSeq](https://bioconductor.org/packages/release/bioc/html/dittoSeq.html) version `r packageVersion("dittoSeq")`

Tidy R:

* [tidyverse](https://www.tidyverse.org/) version `r packageVersion("tidyverse")`

## Image processing {#image-processing}

The analysis presented here fully relies on packages written in the programming
language `R` and primarily focuses on analysis approaches downstream of image
processing. The example data available at
[https://zenodo.org/record/7575859](https://zenodo.org/record/7575859) were
processed (file type conversion, image segmentation, feature extraction as
explained in Section \@ref(processing)) using the
[steinbock](https://bodenmillergroup.github.io/steinbock/latest/) toolkit. The
exact command line interface calls to process the raw data are shown below:

```{r, echo = FALSE, message = FALSE}
if (!dir.exists("data/steinbock")) dir.create("data/steinbock")
if (!dir.exists("data/ImcSegmentationPipeline")) dir.create("data/ImcSegmentationPipeline")
# Pre-download steinbock file
download.file("https://zenodo.org/record/7624451/files/steinbock.sh", 
              "data/steinbock/steinbock.sh")
```

```{bash, file="data/steinbock/steinbock.sh", eval=FALSE}

```

## Download example data {#download-data}

Throughout this tutorial, we will access a number of different data types. 
To declutter the analysis scripts, we will already download all needed data here.

To highlight the basic steps of IMC data analysis, we provide example data that
were acquired as part of the **I**ntegrated i**MMU**noprofiling of large adaptive
**CAN**cer patient cohorts projects ([immucan.eu](https://immucan.eu/)). The
raw data of 4 patients can be accessed online at 
[zenodo.org/record/7575859](https://zenodo.org/record/7575859). We will only
download the sample/patient metadata information here:

```{r download-sample-data}
download.file("https://zenodo.org/record/7575859/files/sample_metadata.csv", 
         destfile = "data/sample_metadata.csv")
```

### Processed multiplexed imaging data

The IMC raw data was either processed using the 
[steinbock](https://github.com/BodenmillerGroup/steinbock) toolkit or the
[IMC Segmentation Pipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline).
Image processing included file type conversion, cell segmentation and feature
extraction. 

**steinbock output**

This book uses the output of the `steinbock` framework when applied to process
the example data. The processed data includes the single-cell mean intensity
files, the single-cell morphological features and spatial locations, spatial
object graphs in form of edge lists indicating cells in close proximity, hot
pixel filtered multi-channel images, segmentation masks, image metadata and
channel metadata. All these files will be downloaded here for later use. The
commands which were used to generate this data can be found in the shell script
above.

```{r steinbock-results}
# download intensities
url <- "https://zenodo.org/record/7624451/files/intensities.zip"
destfile <- "data/steinbock/intensities.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)

# download regionprops
url <- "https://zenodo.org/record/7624451/files/regionprops.zip"
destfile <- "data/steinbock/regionprops.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)

# download neighbors
url <- "https://zenodo.org/record/7624451/files/neighbors.zip"
destfile <- "data/steinbock/neighbors.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)

# download images
url <- "https://zenodo.org/record/7624451/files/img.zip"
destfile <- "data/steinbock/img.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)

# download masks
url <- "https://zenodo.org/record/7624451/files/masks_deepcell.zip"
destfile <- "data/steinbock/masks_deepcell.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)

# download individual files
download.file("https://zenodo.org/record/7624451/files/panel.csv", 
              "data/steinbock/panel.csv")
download.file("https://zenodo.org/record/7624451/files/images.csv", 
              "data/steinbock/images.csv")
download.file("https://zenodo.org/record/7624451/files/steinbock.sh", 
              "data/steinbock/steinbock.sh")
```

**IMC Segmentation Pipeline output**

The example data was also processed using the 
[IMC Segmetation Pipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline) (version 3). 
To highlight the use of the reader function for this type of output, we will need
to download the `cpout` folder which is part of the `analysis` folder. The `cpout`
folder stores all relevant output files of the pipeline. For a full description
of the pipeline, please refer to the [docs](https://bodenmillergroup.github.io/ImcSegmentationPipeline/).

```{r imcsegpipe-results}
# download analysis folder
url <- "https://zenodo.org/record/7997296/files/analysis.zip"
destfile <- "data/ImcSegmentationPipeline/analysis.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/ImcSegmentationPipeline", overwrite=TRUE)
unlink(destfile)

unlink("data/ImcSegmentationPipeline/analysis/cpinp/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/crops/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/histocat/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/ilastik/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/ometiff/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/cpout/images/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/cpout/probabilities/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/cpout/masks/", recursive=TRUE)
```

### Files for spillover matrix estimation

To highlight the estimation and correction of channel-spillover as described by
[@Chevrier2017], we can access an example spillover-acquisition from:

```{r download-spillover-data}
download.file("https://zenodo.org/record/7575859/files/compensation.zip",
              "data/compensation.zip")
unzip("data/compensation.zip", exdir="data", overwrite=TRUE)
unlink("data/compensation.zip")
```

### Gated cells

In Section \@ref(classification), we present a cell type classification approach
that relies on previously gated cells. This ground truth data is available
online at [zenodo.org/record/8095133](https://zenodo.org/record/8095133) and
will be downloaded here for later use:

```{r download-gated-cells}
download.file("https://zenodo.org/record/8095133/files/gated_cells.zip",
              "data/gated_cells.zip")
unzip("data/gated_cells.zip", exdir="data", overwrite=TRUE)
unlink("data/gated_cells.zip")
```

## Software versions {#sessionInfo}

<details>
   <summary>SessionInfo</summary>
   
```{r, echo = FALSE, message = FALSE}
sessionInfo()
```
</details>