Exploring the Current State of Knowledge on the Link Between COVID-19 and Neurodegeneration

This repository contains the data, scripts, and analyses used in the research titled "Understanding the Co-Morbidity between COVID-19 and Neurodegenerative Diseases at Mechanism-Level: Comprehensive Analysis Integrating Databases and Text Mining". The project leverages Neo4j paltform for graph-based analysis and integrates natural language processing to explore relationships between COVID-19 and neurodegenerative diseases (NDDs).

Overview

This project explores the connections between COVID-19 and neurodegenerative diseases by:

Integrating database information about COVID-19 and NDDs and storing them in a graph structure.
Extracting textual data from scientific literature and using natural language processing pipelines for information extraction and KG construction.
Loading all KG in Neo4j to identify and analyse relationships and pathways between entities such as genes, diseases, and chemicals.
Construction of a hypothesis database for omorbidity between COVID-19 and NDDs to explore, analyse, and visualise testable comorbidity hypotheses.

Data

The repository includes the following directories:

Expert-curated-publications: Contains manually curated publications relevant to the study, ensuring high-quality and accurate information.
PubTator3-results: Includes results from PubTator3, a web-based system that offers a comprehensive set of features and tools for exploring biomedical literature using advanced text mining and AI techniques. :contentReference[oaicite:0]{index=0}
Sherpa-results: Houses outputs from Sherpa, a tool designed to assist in the curation of biomedical literature by providing automated annotations and insights.
Textual-corpora-for-textmining: Comprises textual corpora prepared for text mining purposes, facilitating the extraction of meaningful patterns and relationships regarding COVID-19 and NDD.

Sources

1. `comorbidity-hypothesis-db.py`

Purpose: Automatically opens the Neo4j Browser with prefilled credentials to connect to the AuraDB instance for comorbidity hypothesis exploration.
Key Features:
- Simplifies connection to Neo4j by generating a pre-configured URL.
- Useful for direct interaction with the knowledge graph.
Usage: Run the script, and the Neo4j Browser will open in your default web browser:
```
python comorbidity-hypothesis-db.py
```

2. `comorbidity-space-neo4j-upload.py`

Purpose: Uplaoding the comorbidity hypothesis paths to the AuraDB instance for comorbidity hypothesis exploration. The candidate curated paths along with pmids and evidences are stored in 'src/hypothesis_pmid_evidences.csv'.
Key Features:
- Simplifies uploading the hypothesis comorbidity candidates.
Usage: Run the script, and the Neo4j Browser will open in your default web browser:
```
python comorbidity-space-neo4j-upload.py
```

3. `hypothesis-graph-database-upload.py`

Purpose: Manages the upload of hypothesis-based graph data to Neo4j.
Key Features:
- Dedicated notebook for hypothesis data integration
- Structured data validation
- Automated graph relationship creation
Usage:
- Open in Jupyter environment
- Configure data paths
- Execute cells sequentially
Purpose:

A comprehensive data integration pipeline for analyzing relationships between COVID-19 and neurodegenerative diseases (NDDs). This pipeline processes and uploads three types of biomedical data to Neo4j:

Triples hypothesis (filtered triples from all dbs)
Pathway hypothesis (filtered pathways)
GWAS Data (shared variants)

The project leverages Neo4j for graph-based analysis and integrates various data sources to explore disease relationships.

Quick Start

Install Dependencies

pip install pandas neo4j requests rapidfuzz fuzzywuzzy python-Levenshtein

Configure Neo4j Connection Create config.json:

{
    "neo4j": {
        "uri": "neo4j+s://09f8d4e9.databases.neo4j.io",
        "user": "neo4j",
        "password": "your-password"
    }
}

Run Pipeline

from hypothesis-graph-database-upload import DataPipelineRunner, Neo4jConfig

# Configure Neo4j connection
config = Neo4jConfig(
    uri="your_neo4j_uri",
    user="your_username",
    password="your_password"
)

# Run pipeline
runner = DataPipelineRunner(config)
runner.run(
    triple_file="path/to/cleaned_all_db_association.csv",
    pathway_file="path/to/your/hypothesis_pmid_evidences.csv",
    gwas_file="path/to/your/shared-variants.xlsx"
)

Getting Started

Prerequisites

Neo4j AuraDB: Ensure you have access to a Neo4j AuraDB instance. Use the provided connection details or set up your own.
Python Environment: Install the required libraries:
```
pip install neo4j pandas
```

Notebooks

1. `analyze-neo4j.ipynb`

Purpose: Analyzes the knowledge graph loaded to Neo4j to extract insights.
Key Features:
- Counts nodes and edges in the graph.
- Executes community detection algorithms like Louvain using Neo4j's Graph Data Science (GDS) library.
- Retrieves and visualizes properties of detected clusters
Usage: Open the Jupyter Notebook and follow the instructions to:
- Query the Neo4j database.
- Get general statistics about nodes, triple and pathways, and analyze them.

2. `import-neo4j-all-dbs.ipynb`

Purpose: These scripts are designed to upload multiple databases into Neo4j, providing a streamlined workflow for graph-based data integration and analysis.
Prerequisites:
- bel_json_import package for BEL data conversion to eBEL format
- Properly formatted database extracts
Key Features:
- Efficiently import graph data into Neo4j using a common schema
- Seamless integration of complex biological networks
- Privacy-aware data handling
Usage:
- Open the notebook in Jupyter Notebook or JupyterLab
- Place data in required locations
- Run cells specific to each source

Exploring the Covid-NDD Comorbidity Database

To manually explore the comorbidity graph database:

Open the Neo4j Browser:

Navigate to https://browser.neo4j.io.
Enter the Connection Details:
- URI: neo4j+s://09f8d4e9.databases.neo4j.io
- Username: neo4j
- Password: Refer to the credentials provided in the src/comorbidity-hypothesis-db.py.
Run Cypher Queries:

Once connected, you can execute Cypher queries to explore the graph. For example, to retrieve a sample of nodes:
```
MATCH (n) RETURN n LIMIT 10;
```

Contact

For any questions, suggestions, or collaborations, please contact:

Negin Babaiha
Email: [email protected]
Google Scholar Profile

Feel free to reach out for discussions regarding the project!

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
data		data
images		images
notebooks		notebooks
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the Current State of Knowledge on the Link Between COVID-19 and Neurodegeneration

Table of Contents

Overview

Data

Sources

1. `comorbidity-hypothesis-db.py`

2. `comorbidity-space-neo4j-upload.py`

3. `hypothesis-graph-database-upload.py`

Quick Start

Getting Started

Prerequisites

Notebooks

1. `analyze-neo4j.ipynb`

2. `import-neo4j-all-dbs.ipynb`

Exploring the Covid-NDD Comorbidity Database

Contact

About

Releases

Packages

Contributors 2

Languages

SCAI-BIO/covid-NDD-comorbidity-NLP

Folders and files

Latest commit

History

Repository files navigation

Exploring the Current State of Knowledge on the Link Between COVID-19 and Neurodegeneration

Table of Contents

Overview

Data

Sources

1. comorbidity-hypothesis-db.py

2. comorbidity-space-neo4j-upload.py

3. hypothesis-graph-database-upload.py

Quick Start

Getting Started

Prerequisites

Notebooks

1. analyze-neo4j.ipynb

2. import-neo4j-all-dbs.ipynb

Exploring the Covid-NDD Comorbidity Database

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

1. `comorbidity-hypothesis-db.py`

2. `comorbidity-space-neo4j-upload.py`

3. `hypothesis-graph-database-upload.py`

1. `analyze-neo4j.ipynb`

2. `import-neo4j-all-dbs.ipynb`

Packages