Data Analysis - Pathogen Characterisation page (#308)

* add metadata * adding tools * add intro * add general considerations * remove empty ones * extra tools * moved more content from doc * typo in fix in metadata * typo in tool entry * typo fix * more content and tools * add more content from doc * reformat tools * typo fix * add news item * add to sidebar * remove placeholders * add contributor --------- Co-authored-by: bedroesb <[email protected]>
elixir-europe · Sep 19, 2024 · 42cfba6 · 42cfba6
1 parent d22eced
commit 42cfba6
Show file tree

Hide file tree

Showing 6 changed files with 373 additions and 30 deletions.
diff --git a/_data/CONTRIBUTORS.yaml b/_data/CONTRIBUTORS.yaml
@@ -289,5 +289,10 @@ Reagon Karki:
   email: [email protected]
   orcid: https://orcid.org/0000-0002-1815-0037
   affiliation: Fraunhofer ITMP/EU-OpenScreen
+Francesco Messina:
+  orcid: 0000-0001-8076-7217
+  git: INMIbioinfo
+  affiliation: IRCCS (INMI)
+  Email: [email protected]
 
 
diff --git a/_data/news.yml b/_data/news.yml
@@ -138,3 +138,7 @@
   date: 2024-09-05
   linked_pr: 339
   description: A showcase page was added about an open source workflow, integrating biological databases for FAIR data compliant Knowledge Graphs, in the Showcase section. [Discover the page here](/showcase/knowledge-graph-generator)
+- name: "New page: Data Analysis of Pathogen Characterisation data"
+  date: 2024-09-19
+  linked_pr: 308
+  description: Content was added to the Pathogen Characterisation page on Data Analysis. [Discover the page here](/data-analysis/pathogen-characterisation)
diff --git a/_data/sidebars/main.yml b/_data/sidebars/main.yml
@@ -14,6 +14,8 @@ subitems:
     subitems:
       - title: Human biomolecular data
         url: /data-analysis/human-biomolecular-data
+      - title: Pathogen characterisation
+        url: /data-analysis/pathogen-characterisation
 
   - title: Data communication
     url: /data-communication/

diff --git a/_data/tool_and_resource_list.yml b/_data/tool_and_resource_list.yml
@@ -205,8 +205,7 @@
   id: dragen-gatk
   name: Dragen-GATK
   url: https://gatk.broadinstitute.org/hc/en-us/articles/360045944831
-- description: 'Dryad is an open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data. Dryad has a long-term data preservation strategy, and is a Core Trust Seal Certified Merritt repository with storage in US and EU at the San Diego Supercomputing Center, DANS, and Zenodo. While data is undergoing peer review, it is embargoed if the related journal requires / allows this. Dryad is an independent non-profit that works directly with: researchers to publish datasets utilising best practices for discovery and reuse; publishers to support the integration of data availability statements and data citations into their workflows; and institutions to enable scalable campus support for research data management best practices at low cost. Costs are covered by institutional, publisher, and funder members, otherwise a one-time fee of $120 for authors to cover cost of curation and preservation. Dryad also receives direct funder support through
-    grants.'
+- description: Dryad is an open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data.
   id: dryad
   name: Dryad
   registry:
@@ -233,7 +232,7 @@
     fairsharing: mya1ff
     tess: European Genome-phenome Archive (EGA)
   url: https://ega-archive.org/
-- description: 'The European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesaurus for the social sciences. It is owned and published by the Consortium of European Social Science Data Archives (CESSDA) and its national Service Providers. The thesaurus consists of over 3,300 concepts and covers the core social science disciplines: politics, sociology, economics, education, law, crime, demography, health, employment, information, communication technology, and environmental science. ELSST is used for data discovery within CESSDA and facilitates access to data resources across Europe, independent of domain, resource, language, or vocabulary. ELSST is currently available in 16 languages: Danish, Dutch, Czech, English, Finnish, French, German, Greek, Hungarian, Icelandic, Lithuanian, Norwegian, Romanian, Slovenian, Spanish, and Swedish'
+- description: The European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesaurus for the social sciences. It is owned and published by the Consortium of European Social Science Data Archives (CESSDA) and its national Service Providers.
   id: european-language-social-science-thesaurus
   name: European Language Social Science Thesaurus (ELSST)
   registry:
@@ -273,14 +272,14 @@
     fairsharing: dj8nt8
     tess: European Nucleotide Archive (ENA)
   url: https://www.ebi.ac.uk/ena/browser/home
-- description: FAIRsharing is a FAIR-supporting resource that provides an informative and educational registry on data standards, databases, repositories and policy, alongside search and visualization tools and services that interoperate with other FAIR-enabling resources. fairsharing guides consumers to discover, select and use standards, databases, repositories and policy with confidence, and producers to make their resources more discoverable, more widely adopted and cited. Each record in fairsharing is curated in collaboration with the maintainers of the resource themselves, ensuring that the metadata in the fairsharing registry is accurate and timely. Every record is manually reviewed at least once a year. Records can be collated into collections, based on a project, society or organisation, or Recommendations, where they are collated around a policy, such as a journal or funder data policy.
+- description: FAIRsharing is a FAIR-supporting resource that provides an informative and educational registry on data standards, databases, repositories and policy, alongside search and visualization tools and services that interoperate with other FAIR-enabling resources. FAIRsharing guides consumers to discover, select and use standards, databases, repositories and policy with confidence, and producers to make their resources more discoverable, more widely adopted and cited. Each record in fairsharing is curated in collaboration with the maintainers of the resource themselves, ensuring that the metadata in the fairsharing registry is accurate and timely.
   id: fairsharing
   name: FAIRsharing
   registry:
     fairsharing: 2abjs5
     tess: FAIRsharing
   url: https://fairsharing.org/
-- description: Figshare is a generalist, subject-agnostic repository for many different types of digital objects that can be used without cost to researchers. Data can be submitted to the central figshare repository (described here), or institutional repositories using the figshare software can be installed locally, e.g. by universities and publishers. Metadata in figshare is licenced under is CC0. figshare has also partnered with DuraSpace and Chronopolis to offer further assurances that public data will be archived under the stewardship of Chronopolis. figshare is supported through Institutional, Funder, and Governmental service subscriptions.
+- description: Figshare is a generalist, subject-agnostic repository for many different types of digital objects that can be used without cost to researchers. Data can be submitted to the central figshare repository (described here), or institutional repositories using the figshare software can be installed locally, e.g. by universities and publishers.
   id: figshare
   name: Figshare
   registry:
@@ -294,12 +293,12 @@
     biotools: Flye
     tess: Flye
   url: https://github.com/fenderglass/Flye
-- description: FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, MNPs, and complex events smaller than the length of a short-read sequencing alignment.
+- description: freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, MNPs, and complex events smaller than the length of a short-read sequencing alignment.
   id: freebayes
-  name: FreeBayes
+  name: freebayes
   registry:
     biotools: freebayes
-    tess: FreeBayes
+    tess: freebayes
   url: https://github.com/freebayes/freebayes
 - description: The metadata model for GA4GH, an international coalition of both public and private interested parties, formed to enable the sharing of genomic and clinical data.
   id: ga4gh
@@ -672,14 +671,13 @@
   registry:
     biotools: wtdbg2
   url: https://github.com/ruanjue/wtdbg2
-- description: Metabolomic and lipidomic platform
-  id: xcms
-  name: XCMS
+- description: A systems biology tool for analyzing metabolomic data. It automatically superimposes raw metabolomic data onto metabolic pathways and integrates it with transcriptomic and proteomic data.
+  id: xcms-online
+  name: XCMS Online
   registry:
-    biotools: xcms
-    tess: XCMS
+    biotools: xcms_online
   url: https://xcmsonline.scripps.edu/landing_page.php?pgcontent=mainPage
-- description: Zenodo is a generalist research data repository built and developed by OpenAIRE and CERN. It was developed to aid Open Science and is built on open source code. Zenodo helps researchers receive credit by making the research results citable and through OpenAIRE integrates them into existing reporting lines to funding agencies like the European Commission. Citation information is also passed to DataCite and onto the scholarly aggregators. Content is available publicly under any one of 400 open licences (from opendefinition.org and spdx.org). Restricted and Closed content is also supported. Free for researchers below 50 GB/dataset. Content is both online on disk and offline on tape as part of a long-term preservation policy. Zenodo supports managed access (with an access request workflow) as well as embargoing generally and during peer review. The base infrastructure of Zenodo is provided by CERN, a non-profit IGO. Projects are funded through grants.
+- description: Zenodo is a generalist research data repository built and developed by OpenAIRE and CERN. 
   id: zenodo
   name: Zenodo
   registry:
@@ -1020,3 +1018,181 @@
   id: openbel
   name: OpenBEL
   url: https://github.com/OpenBEL/openbel-framework
+- description: Velvet is an algorithm package that has been designed to deal with de novo genome assembly and short read sequencing alignments.  
+  id: velvet
+  name: Velvet
+  url: https://github.com/dzerbino/velvet
+- description: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies
+  id: raxml
+  name: RAxML
+  url: https://github.com/stamatak/standard-RAxML
+- description: IQ-TREE is designed to efficiently handle large phylogenomic datasets, utilize multicore and distributed parallel computing for faster analysis, and automatically resume interrupted analyses through checkpointing.
+  id: iqtree
+  name: IQtree
+  url: https://github.com/iqtree/iqtree2
+- description: MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
+  id: mrbayes
+  name: MrBayes
+  url: https://nbisweden.github.io/MrBayes/
+- description: BEAST is a cross-platform program for Bayesian phylogenetic analysis, estimating rooted, time-measured phylogenies using strict or relaxed molecular clock models. It uses Markov chain Monte Carlo (MCMC) to average over tree space and includes a graphical user interface for setting up analyses and tools for result analysis.
+  id: beast
+  name: BEAST
+  url: https://www.beast2.org/
+- description: Rapid haploid variant calling and core genome alignment.
+  id: snippy
+  name: SNippy
+  url: https://github.com/tseemann/snippy
+- description: Convert ThermoFinningan RAW mass spectrometry files to the mzXML format.
+  id: readw
+  name: ReAdW
+  url: https://github.com/PedrioliLab/ReAdW
+- description: X! Tandem open source is software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.
+  id: x-tandem
+  name: X! Tandem
+  url: https://www.thegpm.org/TANDEM/
+- description: OMSSA (Open Mass Spectrometry Search Algorithm) is a tool to identify peptides in tandem mass spectrometry (MS/MS) data. The OMSSA algorithm uses a classic probability score to compute specificity. See also The NCBI C++ Toolkit and The NCBI C++ Toolkit Book.
+  id: omssa
+  name: OMSSA
+  url: https://ftp.ncbi.nlm.nih.gov/pub/lewisg/omssa/
+- description: MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data.
+  id: maxquant
+  name: MAXQUANT
+  url:  https://www.maxquant.org/
+- description: Absolute protein expression Quantitative Proteomics Tool, is a free and open source Java implementation of the APEX technique for the quantitation of proteins based on standard LC- MS/MS proteomics data. 
+  id: apex
+  name: apex
+  url: http://sourceforge.net/projects/apexqpt/
+  regsitry:
+    biotools: apex
+- description: Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. 
+  id: xcms
+  name: xcms
+  url: http://bioconductor.org/packages/release/bioc/html/xcms.html
+  regsitry:
+    biotools: xcms
+- description: A Meta-Search Peptide Identification Platform for Tandem Mass Spectra
+  id: peparml
+  name: PepArMl
+  url: https://peparml.sourceforge.net/
+  regsitry:
+    biotools: peparml
+- description: A commercial software package for NMR spectral processing that offers a semi-automated tool for spectral deconvolution, enabling interactive fitting of metabolite peaks to reference spectra and quantifying their concentrations.
+  id: chenomx
+  name: Chenomx
+  url: https://www.chenomx.com/
+- description: ResFinder identifies acquired genes and/or finds chromosomal mutations mediating antimicrobial resistance in total or partial DNA sequence of bacteria.
+  id: resfinder
+  name: ResFinder
+  url: http://genepi.food.dtu.dk/resfinder
+  regsitry:
+    biotools: resfinder
+- description: Pathogenwatch provides species and taxonomy prediction for over 60,000 variants of bacteria, viruses, and fungi.
+  id: pathogenwatch
+  name: Pathogenwatch
+  url: https://pathogen.watch/
+- description: CellDesigner is a structured diagram editor for drawing gene-regulatory and biochemical networks. 
+  id: celldesigner
+  name: CellDesigner
+  url: https://www.celldesigner.org/
+- description: "A curated database containing nearly all published HIV RT and protease sequences: a resource designed for researchers studying evolutionary and drug-related variation in the molecular targets of anti-HIV therapy."
+  id: hivdb-stanford
+  name: Stanford HIV Drug Resistance Database (HIVDB)
+  url: https://hivdb.stanford.edu/ 
+- description: Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. 
+  id: nextstrain
+  name: Nextstrain
+  url: http://nextstrain.org
+  regsitry:
+    biotools: nextstrain.org
+- description: g:GOSt performs functional enrichment analysis, also known as over-representation analysis (ORA) or gene set enrichment analysis, on input gene list. 
+  id: g-profiler
+  name: g:Profiler
+  url: https://biit.cs.ut.ee/gprofiler/gost
+  regsitry:
+    biotools: gprofiler
+- description: EuroHPC Joint Undertaking is a joint initiative between the EU, European countries and private partners to develop a World Class Supercomputing Ecosystem in Europe. 
+  id: eurohpc
+  name: EuroHPC
+  url: https://eurohpc-ju.europa.eu/
+  regsitry:
+- description: BEAUti is a graphical user-interface (GUI) application for generating BEAST XML files. 
+  id: beauti
+  name: BEAUti
+  url: https://beast.community/beauti.html
+  regsitry:
+- description: QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency.
+  id: qiime2
+  name: QIIME 2
+  url: https://docs.qiime2.org/
+  regsitry:
+- description: MEGAHIT is an ultra-fast and memory-efficient NGS assembler optimized for metagenomes.
+  id: megahit
+  name: MEGAHIT
+  url: https://github.com/voutcn/megahit
+  regsitry:
+    biotools: megahit
+- description: A taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds.
+  id: kraken2
+  name: Kraken 2
+  url: https://ccb.jhu.edu/software/kraken2/
+  regsitry:
+    biotools: kraken2
+- description: The COVID-19 Disease Map is an assembly of molecular interaction diagrams, established based on literature evidence.
+  id: covid19map
+  name: COVID19 Disease Map
+  url: https://covid19map.elixir-luxembourg.org/
+  regsitry:
+- description: Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference).
+  id: freyja
+  name: Freyja
+  url: https://github.com/andersen-lab/Freyja
+  regsitry:
+    biotools: freyja
+- description: The cojac package comprises a set of command-line tools to analyse co-occurrence of mutations on amplicons.
+  id: cojac
+  name: COJAC
+  url: https://github.com/cbg-ethz/cojac
+  regsitry:
+    biotools: cojac
+- description: Lineagespot is a framework written in R, and aims to identify SARS-CoV-2 related mutations based on a single (or a list) of variant(s) file(s).
+  id: lineagespot
+  name: Lineagespot
+  url: https://github.com/BiodataAnalysisGroup/lineagespot
+  regsitry:
+    biotools: lineagespot
+- description: Kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
+  id: kallisto
+  name: Kallisto
+  url: https://pachterlab.github.io/kallisto/about.html
+  regsitry:
+    biotools: kallisto
+- description: PiGx SARS-CoV-2 is a pipeline for analysing data from sequenced wastewater samples and identifying given lineages of SARS-CoV-2.
+  id: pigxs
+  name: PiGx SARS-CoV-2 Wastewater Sequencing Pipeline
+  url: https://github.com/BIMSBbioinfo/pigx_sars-cov-2
+  regsitry:
+- description: A GitHub repository from the CBG-ETHZ group offering tools for detecting SARS-CoV-2 variants in Switzerland.
+  id: cowwid
+  name: COWWID
+  url: https://github.com/cbg-ethz/cowwid
+  regsitry:
+- description: A SARS-CoV-2 Contextual Data Specification from PHA4GE.
+  id: sars-pha4ge
+  name: SARS-CoV-2 Contextual Data Specification
+  url: https://github.com/pha4ge/SARS-CoV-2-Contextual-Data-Specification
+  regsitry:
+- description: A data model to improve wastewater surveillance through interoperable data.
+  id: phes-odm
+  name: PHES-ODM
+  url: https://github.com/Big-Life-Lab/PHES-ODM
+  regsitry:
+- description: A pipeline for lineage abundance estimation from wastewater sequencing data.
+  id: vlq
+  name: VLQ
+  url: https://github.com/baymlab/wastewater_analysis
+  regsitry:
+- description: CFSAN Wastewater Analysis Pipeline to estimate the percentage of SARS-CoV-2 variants in a sample.
+  id: c-wap
+  name: C-WAP
+  url: https://github.com/CFSAN-Biostatistics/C-WAP
+  regsitry:
diff --git a/data-analysis/human-biomolecular-data.md b/data-analysis/human-biomolecular-data.md
@@ -112,7 +112,7 @@ There are several types of analysis that can be performed on human biomolecular
     - *Interaction databases*: {% tool "biogrid" %} and {% tool "intact" %}
     - *Network analysis*: {% tool "cytoscape" %} and {% tool "genemania" %}
 - **Metabolomics analysis**: This involves measuring the levels of small molecules (metabolites) in biological samples and comparing them across different conditions or groups of samples. This can help to identify biomarkers of disease or drug response.
-    - *Data processing*: {% tool "xcms" %}, {% tool "mzmine" %} and {% tool "openms" %}
+    - *Data processing*: {% tool "xcms-online" %}, {% tool "mzmine" %} and {% tool "openms" %}
     - *Statistical analysis*: {% tool "metaboanalyst" %} and {% tool "metsign" %}
 
 ## Postprocessing