Interpret textual data generated from medical vocal memos
In the Library of Celsus in Ephesus, built in the 2nd century, there are four statues depicting wisdom (Sophia), knowledge (Episteme), intelligence (Ennoia) and excellence (Arete). Our project is named after this city and the goddess Sophia.
After visiting a patient nurses and doctors need to quickly and easily send information
So they record a vocal memo after each visit
Today these memos are read by humans and the infos are manually entered in the database
We want to ease their work by automatically extracting informations from the vocal memos and pre-filling the informations to be entered in the database
4000 vocal memo recordings (4000 sentences)
14 targets to predict (up to 14 different pieces of informations per memo)
Here is an example of a memo
And here is the corresponding informations we need to extract
Clean the data from stop words and punctuation
We identify which part of the memo (which group of words) corresonds to which information
For this, we build a Named Entity Recognition model (NER) using the spaCy library
We build models to convert each information into the target classes using the nltk library
You can play around with our demo here
In this demo, we let you try your own sentences and see the results from our models
Show our success percentage
Give our feedback on possible improvement points and share the hypotheses we used to build our models
Clone the project:
git clone [email protected]:GeoffroyGit/ephesus.git
We recommend you to create a fresh virtual environment
Create a python3 virtualenv and activate it:
cd ephesus
pyenv virtualenv ephesus
pyenv local ephesus
Upgrade pip if needed:
pip install --upgrade pip
Install the package:
pip install -r requirements.txt
pip install -e .
Run the API on your machine:
make run_api
Build the docker image:
make docker_build
Run a container on your machine:
make docker_run
Stop the container running on your machine
docker ps
docker stop <container id>
Push the image to Google Cloud Platform (GCP):
make docker_push
Run a container on GCP:
make docker_deploy
You'll need similar training data in order to train the models
We're sorry we can't share our data
mkdir models
mkdir models/config
Download base config on https://spacy.io/usage/training (select only French and NER) and save it to models/config/base_config.cfg
Fill config file with default values:
cd models/config/
python -m spacy init fill-config base_config.cfg config.cfg
Create train set and test set for the model:
cd ephesus/
python sentence.py
Create variable to host training data file name (put the same names as in ephesus/sentence.py):
export EPHESUS_TRAINING_DATA = "train_set_v2.spacy"
export EPHESUS_TEST_DATA = "test_set_v2.spacy"
Train the model:
cd models/
mkdir model_v2
cd models/config/
python -m spacy train config.cfg --output ../model_v2 --paths.train ../../raw_data/$EPHESUS_TRAINING_DATA --paths.dev ../../raw_data/$EPHESUS_TRAINING_DATA
Evaluate the model:
cd models/model_v2/
mkdir eval
cd models/config/
python -m spacy evaluate ../model_v2/model-best ../../raw_data/$EPHESUS_TEST_DATA -dp ../model_v2/EVAL -o ../model_v2/EVAL/model_v2_scores.json
Train and evaluate the models for treatment and location:
cd ephesus/
python nlp.py
Check the classes for date and time:
cd ephesus/
python timedate.py
Congratulations!