Skip to content

Commit

Permalink
Data Analysis - Pathogen Characterisation page (#308)
Browse files Browse the repository at this point in the history
* add metadata

* adding tools

* add intro

* add general considerations

* remove empty ones

* extra tools

* moved more content from doc

* typo in fix in metadata

* typo in tool entry

* typo fix

* more content and tools

* add more content from doc

* reformat tools

* typo fix

* add news item

* add to sidebar

* remove placeholders

* add contributor

---------

Co-authored-by: bedroesb <[email protected]>
  • Loading branch information
rabuono and bedroesb authored Sep 19, 2024
1 parent d22eced commit 42cfba6
Show file tree
Hide file tree
Showing 6 changed files with 373 additions and 30 deletions.
5 changes: 5 additions & 0 deletions _data/CONTRIBUTORS.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -289,5 +289,10 @@ Reagon Karki:
email: [email protected]
orcid: https://orcid.org/0000-0002-1815-0037
affiliation: Fraunhofer ITMP/EU-OpenScreen
Francesco Messina:
orcid: 0000-0001-8076-7217
git: INMIbioinfo
affiliation: IRCCS (INMI)
Email: [email protected]


4 changes: 4 additions & 0 deletions _data/news.yml
Original file line number Diff line number Diff line change
Expand Up @@ -138,3 +138,7 @@
date: 2024-09-05
linked_pr: 339
description: A showcase page was added about an open source workflow, integrating biological databases for FAIR data compliant Knowledge Graphs, in the Showcase section. [Discover the page here](/showcase/knowledge-graph-generator)
- name: "New page: Data Analysis of Pathogen Characterisation data"
date: 2024-09-19
linked_pr: 308
description: Content was added to the Pathogen Characterisation page on Data Analysis. [Discover the page here](/data-analysis/pathogen-characterisation)
2 changes: 2 additions & 0 deletions _data/sidebars/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ subitems:
subitems:
- title: Human biomolecular data
url: /data-analysis/human-biomolecular-data
- title: Pathogen characterisation
url: /data-analysis/pathogen-characterisation

- title: Data communication
url: /data-communication/
Expand Down
204 changes: 190 additions & 14 deletions _data/tool_and_resource_list.yml
Original file line number Diff line number Diff line change
Expand Up @@ -205,8 +205,7 @@
id: dragen-gatk
name: Dragen-GATK
url: https://gatk.broadinstitute.org/hc/en-us/articles/360045944831
- description: 'Dryad is an open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data. Dryad has a long-term data preservation strategy, and is a Core Trust Seal Certified Merritt repository with storage in US and EU at the San Diego Supercomputing Center, DANS, and Zenodo. While data is undergoing peer review, it is embargoed if the related journal requires / allows this. Dryad is an independent non-profit that works directly with: researchers to publish datasets utilising best practices for discovery and reuse; publishers to support the integration of data availability statements and data citations into their workflows; and institutions to enable scalable campus support for research data management best practices at low cost. Costs are covered by institutional, publisher, and funder members, otherwise a one-time fee of $120 for authors to cover cost of curation and preservation. Dryad also receives direct funder support through
grants.'
- description: Dryad is an open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data.
id: dryad
name: Dryad
registry:
Expand All @@ -233,7 +232,7 @@
fairsharing: mya1ff
tess: European Genome-phenome Archive (EGA)
url: https://ega-archive.org/
- description: 'The European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesaurus for the social sciences. It is owned and published by the Consortium of European Social Science Data Archives (CESSDA) and its national Service Providers. The thesaurus consists of over 3,300 concepts and covers the core social science disciplines: politics, sociology, economics, education, law, crime, demography, health, employment, information, communication technology, and environmental science. ELSST is used for data discovery within CESSDA and facilitates access to data resources across Europe, independent of domain, resource, language, or vocabulary. ELSST is currently available in 16 languages: Danish, Dutch, Czech, English, Finnish, French, German, Greek, Hungarian, Icelandic, Lithuanian, Norwegian, Romanian, Slovenian, Spanish, and Swedish'
- description: The European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesaurus for the social sciences. It is owned and published by the Consortium of European Social Science Data Archives (CESSDA) and its national Service Providers.
id: european-language-social-science-thesaurus
name: European Language Social Science Thesaurus (ELSST)
registry:
Expand Down Expand Up @@ -273,14 +272,14 @@
fairsharing: dj8nt8
tess: European Nucleotide Archive (ENA)
url: https://www.ebi.ac.uk/ena/browser/home
- description: FAIRsharing is a FAIR-supporting resource that provides an informative and educational registry on data standards, databases, repositories and policy, alongside search and visualization tools and services that interoperate with other FAIR-enabling resources. fairsharing guides consumers to discover, select and use standards, databases, repositories and policy with confidence, and producers to make their resources more discoverable, more widely adopted and cited. Each record in fairsharing is curated in collaboration with the maintainers of the resource themselves, ensuring that the metadata in the fairsharing registry is accurate and timely. Every record is manually reviewed at least once a year. Records can be collated into collections, based on a project, society or organisation, or Recommendations, where they are collated around a policy, such as a journal or funder data policy.
- description: FAIRsharing is a FAIR-supporting resource that provides an informative and educational registry on data standards, databases, repositories and policy, alongside search and visualization tools and services that interoperate with other FAIR-enabling resources. FAIRsharing guides consumers to discover, select and use standards, databases, repositories and policy with confidence, and producers to make their resources more discoverable, more widely adopted and cited. Each record in fairsharing is curated in collaboration with the maintainers of the resource themselves, ensuring that the metadata in the fairsharing registry is accurate and timely.
id: fairsharing
name: FAIRsharing
registry:
fairsharing: 2abjs5
tess: FAIRsharing
url: https://fairsharing.org/
- description: Figshare is a generalist, subject-agnostic repository for many different types of digital objects that can be used without cost to researchers. Data can be submitted to the central figshare repository (described here), or institutional repositories using the figshare software can be installed locally, e.g. by universities and publishers. Metadata in figshare is licenced under is CC0. figshare has also partnered with DuraSpace and Chronopolis to offer further assurances that public data will be archived under the stewardship of Chronopolis. figshare is supported through Institutional, Funder, and Governmental service subscriptions.
- description: Figshare is a generalist, subject-agnostic repository for many different types of digital objects that can be used without cost to researchers. Data can be submitted to the central figshare repository (described here), or institutional repositories using the figshare software can be installed locally, e.g. by universities and publishers.
id: figshare
name: Figshare
registry:
Expand All @@ -294,12 +293,12 @@
biotools: Flye
tess: Flye
url: https://github.com/fenderglass/Flye
- description: FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, MNPs, and complex events smaller than the length of a short-read sequencing alignment.
- description: freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, MNPs, and complex events smaller than the length of a short-read sequencing alignment.
id: freebayes
name: FreeBayes
name: freebayes
registry:
biotools: freebayes
tess: FreeBayes
tess: freebayes
url: https://github.com/freebayes/freebayes
- description: The metadata model for GA4GH, an international coalition of both public and private interested parties, formed to enable the sharing of genomic and clinical data.
id: ga4gh
Expand Down Expand Up @@ -672,14 +671,13 @@
registry:
biotools: wtdbg2
url: https://github.com/ruanjue/wtdbg2
- description: Metabolomic and lipidomic platform
id: xcms
name: XCMS
- description: A systems biology tool for analyzing metabolomic data. It automatically superimposes raw metabolomic data onto metabolic pathways and integrates it with transcriptomic and proteomic data.
id: xcms-online
name: XCMS Online
registry:
biotools: xcms
tess: XCMS
biotools: xcms_online
url: https://xcmsonline.scripps.edu/landing_page.php?pgcontent=mainPage
- description: Zenodo is a generalist research data repository built and developed by OpenAIRE and CERN. It was developed to aid Open Science and is built on open source code. Zenodo helps researchers receive credit by making the research results citable and through OpenAIRE integrates them into existing reporting lines to funding agencies like the European Commission. Citation information is also passed to DataCite and onto the scholarly aggregators. Content is available publicly under any one of 400 open licences (from opendefinition.org and spdx.org). Restricted and Closed content is also supported. Free for researchers below 50 GB/dataset. Content is both online on disk and offline on tape as part of a long-term preservation policy. Zenodo supports managed access (with an access request workflow) as well as embargoing generally and during peer review. The base infrastructure of Zenodo is provided by CERN, a non-profit IGO. Projects are funded through grants.
- description: Zenodo is a generalist research data repository built and developed by OpenAIRE and CERN.
id: zenodo
name: Zenodo
registry:
Expand Down Expand Up @@ -1020,3 +1018,181 @@
id: openbel
name: OpenBEL
url: https://github.com/OpenBEL/openbel-framework
- description: Velvet is an algorithm package that has been designed to deal with de novo genome assembly and short read sequencing alignments.
id: velvet
name: Velvet
url: https://github.com/dzerbino/velvet
- description: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies
id: raxml
name: RAxML
url: https://github.com/stamatak/standard-RAxML
- description: IQ-TREE is designed to efficiently handle large phylogenomic datasets, utilize multicore and distributed parallel computing for faster analysis, and automatically resume interrupted analyses through checkpointing.
id: iqtree
name: IQtree
url: https://github.com/iqtree/iqtree2
- description: MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
id: mrbayes
name: MrBayes
url: https://nbisweden.github.io/MrBayes/
- description: BEAST is a cross-platform program for Bayesian phylogenetic analysis, estimating rooted, time-measured phylogenies using strict or relaxed molecular clock models. It uses Markov chain Monte Carlo (MCMC) to average over tree space and includes a graphical user interface for setting up analyses and tools for result analysis.
id: beast
name: BEAST
url: https://www.beast2.org/
- description: Rapid haploid variant calling and core genome alignment.
id: snippy
name: SNippy
url: https://github.com/tseemann/snippy
- description: Convert ThermoFinningan RAW mass spectrometry files to the mzXML format.
id: readw
name: ReAdW
url: https://github.com/PedrioliLab/ReAdW
- description: X! Tandem open source is software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.
id: x-tandem
name: X! Tandem
url: https://www.thegpm.org/TANDEM/
- description: OMSSA (Open Mass Spectrometry Search Algorithm) is a tool to identify peptides in tandem mass spectrometry (MS/MS) data. The OMSSA algorithm uses a classic probability score to compute specificity. See also The NCBI C++ Toolkit and The NCBI C++ Toolkit Book.
id: omssa
name: OMSSA
url: https://ftp.ncbi.nlm.nih.gov/pub/lewisg/omssa/
- description: MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data.
id: maxquant
name: MAXQUANT
url: https://www.maxquant.org/
- description: Absolute protein expression Quantitative Proteomics Tool, is a free and open source Java implementation of the APEX technique for the quantitation of proteins based on standard LC- MS/MS proteomics data.
id: apex
name: apex
url: http://sourceforge.net/projects/apexqpt/
regsitry:
biotools: apex
- description: Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data.
id: xcms
name: xcms
url: http://bioconductor.org/packages/release/bioc/html/xcms.html
regsitry:
biotools: xcms
- description: A Meta-Search Peptide Identification Platform for Tandem Mass Spectra
id: peparml
name: PepArMl
url: https://peparml.sourceforge.net/
regsitry:
biotools: peparml
- description: A commercial software package for NMR spectral processing that offers a semi-automated tool for spectral deconvolution, enabling interactive fitting of metabolite peaks to reference spectra and quantifying their concentrations.
id: chenomx
name: Chenomx
url: https://www.chenomx.com/
- description: ResFinder identifies acquired genes and/or finds chromosomal mutations mediating antimicrobial resistance in total or partial DNA sequence of bacteria.
id: resfinder
name: ResFinder
url: http://genepi.food.dtu.dk/resfinder
regsitry:
biotools: resfinder
- description: Pathogenwatch provides species and taxonomy prediction for over 60,000 variants of bacteria, viruses, and fungi.
id: pathogenwatch
name: Pathogenwatch
url: https://pathogen.watch/
- description: CellDesigner is a structured diagram editor for drawing gene-regulatory and biochemical networks.
id: celldesigner
name: CellDesigner
url: https://www.celldesigner.org/
- description: "A curated database containing nearly all published HIV RT and protease sequences: a resource designed for researchers studying evolutionary and drug-related variation in the molecular targets of anti-HIV therapy."
id: hivdb-stanford
name: Stanford HIV Drug Resistance Database (HIVDB)
url: https://hivdb.stanford.edu/
- description: Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data.
id: nextstrain
name: Nextstrain
url: http://nextstrain.org
regsitry:
biotools: nextstrain.org
- description: g:GOSt performs functional enrichment analysis, also known as over-representation analysis (ORA) or gene set enrichment analysis, on input gene list.
id: g-profiler
name: g:Profiler
url: https://biit.cs.ut.ee/gprofiler/gost
regsitry:
biotools: gprofiler
- description: EuroHPC Joint Undertaking is a joint initiative between the EU, European countries and private partners to develop a World Class Supercomputing Ecosystem in Europe.
id: eurohpc
name: EuroHPC
url: https://eurohpc-ju.europa.eu/
regsitry:
- description: BEAUti is a graphical user-interface (GUI) application for generating BEAST XML files.
id: beauti
name: BEAUti
url: https://beast.community/beauti.html
regsitry:
- description: QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency.
id: qiime2
name: QIIME 2
url: https://docs.qiime2.org/
regsitry:
- description: MEGAHIT is an ultra-fast and memory-efficient NGS assembler optimized for metagenomes.
id: megahit
name: MEGAHIT
url: https://github.com/voutcn/megahit
regsitry:
biotools: megahit
- description: A taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds.
id: kraken2
name: Kraken 2
url: https://ccb.jhu.edu/software/kraken2/
regsitry:
biotools: kraken2
- description: The COVID-19 Disease Map is an assembly of molecular interaction diagrams, established based on literature evidence.
id: covid19map
name: COVID19 Disease Map
url: https://covid19map.elixir-luxembourg.org/
regsitry:
- description: Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference).
id: freyja
name: Freyja
url: https://github.com/andersen-lab/Freyja
regsitry:
biotools: freyja
- description: The cojac package comprises a set of command-line tools to analyse co-occurrence of mutations on amplicons.
id: cojac
name: COJAC
url: https://github.com/cbg-ethz/cojac
regsitry:
biotools: cojac
- description: Lineagespot is a framework written in R, and aims to identify SARS-CoV-2 related mutations based on a single (or a list) of variant(s) file(s).
id: lineagespot
name: Lineagespot
url: https://github.com/BiodataAnalysisGroup/lineagespot
regsitry:
biotools: lineagespot
- description: Kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
id: kallisto
name: Kallisto
url: https://pachterlab.github.io/kallisto/about.html
regsitry:
biotools: kallisto
- description: PiGx SARS-CoV-2 is a pipeline for analysing data from sequenced wastewater samples and identifying given lineages of SARS-CoV-2.
id: pigxs
name: PiGx SARS-CoV-2 Wastewater Sequencing Pipeline
url: https://github.com/BIMSBbioinfo/pigx_sars-cov-2
regsitry:
- description: A GitHub repository from the CBG-ETHZ group offering tools for detecting SARS-CoV-2 variants in Switzerland.
id: cowwid
name: COWWID
url: https://github.com/cbg-ethz/cowwid
regsitry:
- description: A SARS-CoV-2 Contextual Data Specification from PHA4GE.
id: sars-pha4ge
name: SARS-CoV-2 Contextual Data Specification
url: https://github.com/pha4ge/SARS-CoV-2-Contextual-Data-Specification
regsitry:
- description: A data model to improve wastewater surveillance through interoperable data.
id: phes-odm
name: PHES-ODM
url: https://github.com/Big-Life-Lab/PHES-ODM
regsitry:
- description: A pipeline for lineage abundance estimation from wastewater sequencing data.
id: vlq
name: VLQ
url: https://github.com/baymlab/wastewater_analysis
regsitry:
- description: CFSAN Wastewater Analysis Pipeline to estimate the percentage of SARS-CoV-2 variants in a sample.
id: c-wap
name: C-WAP
url: https://github.com/CFSAN-Biostatistics/C-WAP
regsitry:
2 changes: 1 addition & 1 deletion data-analysis/human-biomolecular-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ There are several types of analysis that can be performed on human biomolecular
- *Interaction databases*: {% tool "biogrid" %} and {% tool "intact" %}
- *Network analysis*: {% tool "cytoscape" %} and {% tool "genemania" %}
- **Metabolomics analysis**: This involves measuring the levels of small molecules (metabolites) in biological samples and comparing them across different conditions or groups of samples. This can help to identify biomarkers of disease or drug response.
- *Data processing*: {% tool "xcms" %}, {% tool "mzmine" %} and {% tool "openms" %}
- *Data processing*: {% tool "xcms-online" %}, {% tool "mzmine" %} and {% tool "openms" %}
- *Statistical analysis*: {% tool "metaboanalyst" %} and {% tool "metsign" %}

## Postprocessing
Expand Down
Loading

0 comments on commit 42cfba6

Please sign in to comment.