Skip to content

LebeerLab/tidytacos

Repository files navigation

tidytacos

R-CMD-check codecov




Overview

Tidytacos (tidy TAxonomic COmpositionS) is an R package for the exploration of microbial community data. Such community data consists of read counts generated by amplicon sequencing (e.g. a region of the 16S rRNA gene) or metagenome (shotgun) sequencing. Each read count represents a number of sequencing reads identified for some taxon (an ASV, OTU, species, or higher-level taxon) in a sample.

Tidytacos builds on the tidyverse created by Hadley Wickham: the data are stored in tidy tables where each row is an observation and each column a variable. In addition, the package supplies a set of "verbs": functions that take a tidytacos object as first argument and also return a tidytacos object. This makes it easy to construct "pipe chains" of code that represent series of operations performed on the tidytacos object.

Prerequisites

Tidytacos is an R package. You can find instructions to download and install R here.

Tidytacos relies on the tidyverse R package (or, more accurately, set of R packages). You can install the tidyverse by running the following R code:

install.packages("tidyverse")

Finally, RStudio is a nice IDE to work with R code (as well as code in other scripting languages). It has a lot more features than what the default R IDE allows: beyond creating and saving scripts, it also shows your figures, allows you to navigate files, allows you to inspect tables etc. You can download RStudio here.

Installation

Run the following R code to install the latest version of tidytacos:

install.packages("devtools")
devtools::install_github("LebeerLab/tidytacos")

Getting started

If your ASVs are counted and annotated using dada2, you can use the following function to convert the results to a tidytacos object:

seqtab <- readRDS(system.file("extdata", "dada2", "seqtab.rds", package = "tidytacos"))
taxa <- readRDS(system.file("extdata", "dada2", "taxa.rds", package = "tidytacos"))

taco <- from_dada(seqtab, taxa)

Where seqtab and taxa refer to the R objects as calculated in the dada2 tutorial and available in the extdata of this package.

If you have data in the form of a phyloseq object you could convert it using:

phylo_obj <- readRDS(system.file("extdata","phyloseq.rds",package='tidytacos'))
taco <- from_phyloseq(phylo_obj)

You may wish to create a tidytacos object from your counts matrix, for example an OTU table where rownames are samples and colnames are taxa. After that is done, you can add your taxonomy table and sample data. The variable 'taxon' of the taxonomy table should align with the rownames of the OTU table. Furthermore it may include all taxonomic levels from 'kingdom' or 'domain' to 'species' and a 'sequence' variable (nucleotide sequence). The variable 'sample' of the sample data should align with the colnames of the OTU table.

seqtab <- readRDS(system.file("extdata", "dada2", "seqtab.rds", package = "tidytacos"))
taxa <- readRDS(system.file("extdata", "dada2", "taxa.rds", package = "tidytacos"))
taxon <- rownames(taxa)
taxa <- cbind(taxon, as_tibble(taxa))

taco <- create_tidytacos(seqtab, taxa_are_columns = TRUE)
taco <- taco%>%
  add_metadata(taxa, table_type="taxa")

A tidytacos object is read and stored as three sparse tables (counts-, taxa- and samples.csv). To read in existing data from a folder, for example one called ‘leaf’ in the ‘data-raw/tidytacos’ folder you would run:

my_path <- system.file("extdata", "tidytacos", "leaf", package = "tidytacos")
taco <- read_tidytacos(my_path)

To read your own tidytacos data replace the path with a local path.

Documentation

A documentation page (help page) is available for all functions in the browser or in R. You can view it in R by running e.g. ?filter_samples. Some useful tutorials can be found on the wiki.

Need support?

Post on GitHub issues if you have questions, requests, or if you run into an issue.

Feel like contributing?

Please read the GitHub Developer Guide. Fork the dev branch, make your changes and make a pull request. Your suggestions will be reviewed and if approved, will be implemented in the next release.