GWAS Miner was created as part of my PhD project at the University of Leicester, tackling the problem of extracting meaningful data from GWAS publication text.
- Extraction of genotype to phenotype associations, including genetic marker, disease and significance score (p-value).
- Visualisation of entity recognition and sentence structure within GWAS publication text.
- View publication statistics such as number of ontology disease term occurrences.
GWAS Miner can be utilised both with a graphical user interface and through passing command line parameters.
Launching the graphical user interface can be done by passing the -g
parameter to GWASMiner.py, allowing quick
and easy access to all of it's features.
GWAS Miner is designed to utilise BioC-JSON files such as those generated by the Auto-CORPus project (https://github.com/omicsNLP/Auto-CORPus), including the produced Tables-BioC JSON files.
python GWASMiner.py -g
The following subset of features are available without launching the graphical user interface.
python GWASMiner.py -d <path_to_directory>
python GWASMiner.py -u
python GWASMiner.py -d <path_to_file> -g "ents"
python GWASMiner.py -d <path_to_file> -g "sents"
The following Python packages are required to run GWAS Miner, using at least python3.5 or later:
- python-dateutil
- rdflib
- owlready2
- lxml
- jsonschema
- rtgo
- networkx
- spacy
- SciSpaCy pre-trained data model:
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_md-0.2.5.tar.gz
- svglib
- reportlab
- PyQt5
For issue reporting and feedback/recommendations please email Thomas Rowlands at [email protected].