This repository contains the data, scripts, and analyses used in the research titled "Understanding the Co-Morbidity between COVID-19 and Neurodegenerative Diseases at Mechanism-Level: Comprehensive Analysis Integrating Databases and Text Mining". The project leverages Neo4j paltform for graph-based analysis and integrates natural language processing to explore relationships between COVID-19 and neurodegenerative diseases (NDDs).
- Overview
- Data
- Sources
- Notebooks
- Getting Started
- Exploring the Covid-NDD Comorbidity Database
- Contact
This project explores the connections between COVID-19 and neurodegenerative diseases by:
- Integrating database information about COVID-19 and NDDs and storing them in a graph structure.
- Extracting textual data from scientific literature and using natural language processing pipelines for information extraction and KG construction.
- Loading all KG in Neo4j to identify and analyse relationships and pathways between entities such as genes, diseases, and chemicals.
- Construction of a hypothesis database for omorbidity between COVID-19 and NDDs to explore, analyse, and visualise testable comorbidity hypotheses.
The repository includes the following directories:
-
Expert-curated-publications: Contains manually curated publications relevant to the study, ensuring high-quality and accurate information.
-
PubTator3-results: Includes results from PubTator3, a web-based system that offers a comprehensive set of features and tools for exploring biomedical literature using advanced text mining and AI techniques. :contentReference[oaicite:0]{index=0}
-
Sherpa-results: Houses outputs from Sherpa, a tool designed to assist in the curation of biomedical literature by providing automated annotations and insights.
-
Textual-corpora-for-textmining: Comprises textual corpora prepared for text mining purposes, facilitating the extraction of meaningful patterns and relationships regarding COVID-19 and NDD.
- Purpose: Automatically opens the Neo4j Browser with prefilled credentials to connect to the AuraDB instance for comorbidity hypothesis exploration.
- Key Features:
- Simplifies connection to Neo4j by generating a pre-configured URL.
- Useful for direct interaction with the knowledge graph.
- Usage:
Run the script, and the Neo4j Browser will open in your default web browser:
python comorbidity-hypothesis-db.py
- Purpose: Uplaoding the comorbidity hypothesis paths to the AuraDB instance for comorbidity hypothesis exploration. The candidate curated paths along with pmids and evidences are stored in 'src/hypothesis_pmid_evidences.csv'.
- Key Features:
- Simplifies uploading the hypothesis comorbidity candidates.
- Usage:
Run the script, and the Neo4j Browser will open in your default web browser:
python comorbidity-space-neo4j-upload.py
-
Purpose: Manages the upload of hypothesis-based graph data to Neo4j.
-
Key Features:
- Dedicated notebook for hypothesis data integration
- Structured data validation
- Automated graph relationship creation
-
Usage:
- Open in Jupyter environment
- Configure data paths
- Execute cells sequentially
-
Purpose:
A comprehensive data integration pipeline for analyzing relationships between COVID-19 and neurodegenerative diseases (NDDs). This pipeline processes and uploads three types of biomedical data to Neo4j:
- Triples hypothesis (filtered triples from all dbs)
- Pathway hypothesis (filtered pathways)
- GWAS Data (shared variants)
The project leverages Neo4j for graph-based analysis and integrates various data sources to explore disease relationships.
- Install Dependencies
pip install pandas neo4j requests rapidfuzz fuzzywuzzy python-Levenshtein
- Configure Neo4j Connection
Create
config.json
:
{
"neo4j": {
"uri": "neo4j+s://09f8d4e9.databases.neo4j.io",
"user": "neo4j",
"password": "your-password"
}
}
- Run Pipeline
from hypothesis-graph-database-upload import DataPipelineRunner, Neo4jConfig
# Configure Neo4j connection
config = Neo4jConfig(
uri="your_neo4j_uri",
user="your_username",
password="your_password"
)
# Run pipeline
runner = DataPipelineRunner(config)
runner.run(
triple_file="path/to/cleaned_all_db_association.csv",
pathway_file="path/to/your/hypothesis_pmid_evidences.csv",
gwas_file="path/to/your/shared-variants.xlsx"
)
-
Neo4j AuraDB: Ensure you have access to a Neo4j AuraDB instance. Use the provided connection details or set up your own.
-
Python Environment: Install the required libraries:
pip install neo4j pandas
- Purpose: Analyzes the knowledge graph loaded to Neo4j to extract insights.
- Key Features:
- Counts nodes and edges in the graph.
- Executes community detection algorithms like Louvain using Neo4j's Graph Data Science (GDS) library.
- Retrieves and visualizes properties of detected clusters
- Usage:
Open the Jupyter Notebook and follow the instructions to:
- Query the Neo4j database.
- Get general statistics about nodes, triple and pathways, and analyze them.
- Purpose: These scripts are designed to upload multiple databases into Neo4j, providing a streamlined workflow for graph-based data integration and analysis.
- Prerequisites:
- bel_json_import package for BEL data conversion to eBEL format
- Properly formatted database extracts
- Key Features:
- Efficiently import graph data into Neo4j using a common schema
- Seamless integration of complex biological networks
- Privacy-aware data handling
- Usage:
- Open the notebook in Jupyter Notebook or JupyterLab
- Place data in required locations
- Run cells specific to each source
To manually explore the comorbidity graph database:
-
Open the Neo4j Browser:
Navigate to https://browser.neo4j.io.
-
Enter the Connection Details:
-
URI:
neo4j+s://09f8d4e9.databases.neo4j.io
-
Username:
neo4j
-
Password: Refer to the credentials provided in the src/comorbidity-hypothesis-db.py.
-
-
Run Cypher Queries:
Once connected, you can execute Cypher queries to explore the graph. For example, to retrieve a sample of nodes:
MATCH (n) RETURN n LIMIT 10;
For any questions, suggestions, or collaborations, please contact:
Negin Babaiha
Email: [email protected]
Google Scholar Profile
Feel free to reach out for discussions regarding the project!