Project Ephesus

Interpret textual data generated from medical vocal memos

In the Library of Celsus in Ephesus, built in the 2nd century, there are four statues depicting wisdom (Sophia), knowledge (Episteme), intelligence (Ennoia) and excellence (Arete). Our project is named after this city and the goddess Sophia.

What it's all about

After visiting a patient nurses and doctors need to quickly and easily send information

So they record a vocal memo after each visit

Today these memos are read by humans and the infos are manually entered in the database

We want to ease their work by automatically extracting informations from the vocal memos and pre-filling the informations to be entered in the database

Dataset

4000 vocal memo recordings (4000 sentences)

14 targets to predict (up to 14 different pieces of informations per memo)

Example

Here is an example of a memo

And here is the corresponding informations we need to extract

Our approach

Preprocessing

Clean the data from stop words and punctuation

Identify groups of words

We identify which part of the memo (which group of words) corresonds to which information

For this, we build a Named Entity Recognition model (NER) using the spaCy library

Convert each group of word into meaningfull information

We build models to convert each information into the target classes using the nltk library

Demo

Try it yourself

You can play around with our demo here

In this demo, we let you try your own sentences and see the results from our models

Going further

Show our success percentage

Give our feedback on possible improvement points and share the hypotheses we used to build our models

Run our code yourself

Install the Ephesus package

Clone the project:

git clone [email protected]:GeoffroyGit/ephesus.git

We recommend you to create a fresh virtual environment

Create a python3 virtualenv and activate it:

cd ephesus
pyenv virtualenv ephesus
pyenv local ephesus

Upgrade pip if needed:

pip install --upgrade pip

Install the package:

pip install -r requirements.txt
pip install -e .

Run the API locally

Run the API on your machine:

make run_api

Run the API in a Docker container in the cloud

Build the docker image:

make docker_build

Run a container on your machine:

make docker_run

Stop the container running on your machine

docker ps
docker stop <container id>

Push the image to Google Cloud Platform (GCP):

make docker_push

Run a container on GCP:

make docker_deploy

Train the models yourself

Training data

You'll need similar training data in order to train the models

We're sorry we can't share our data

Train the NER with spaCy

Create folders

mkdir models
mkdir models/config

Download config

Download base config on https://spacy.io/usage/training (select only French and NER) and save it to models/config/base_config.cfg

Fill config

Fill config file with default values:

cd models/config/
python -m spacy init fill-config base_config.cfg config.cfg

Create data sets

Create train set and test set for the model:

cd ephesus/
python sentence.py

Create variable to host training data file name (put the same names as in ephesus/sentence.py):

export EPHESUS_TRAINING_DATA = "train_set_v2.spacy"
export EPHESUS_TEST_DATA = "test_set_v2.spacy"

Train

Train the model:

cd models/
mkdir model_v2
cd models/config/
python -m spacy train config.cfg --output ../model_v2 --paths.train ../../raw_data/$EPHESUS_TRAINING_DATA --paths.dev ../../raw_data/$EPHESUS_TRAINING_DATA

Evaluate

Evaluate the model:

cd models/model_v2/
mkdir eval
cd models/config/
python -m spacy evaluate ../model_v2/model-best ../../raw_data/$EPHESUS_TEST_DATA -dp ../model_v2/EVAL -o ../model_v2/EVAL/model_v2_scores.json

Train the RNN with nltk

Train and evaluate the models for treatment and location:

cd ephesus/
python nlp.py

Check other objects

Check the classes for date and time:

cd ephesus/
python timedate.py

You're done

Congratulations!

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
api		api
ephesus		ephesus
notebooks		notebooks
readme_pictures		readme_pictures
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Ephesus

What it's all about

Dataset

Example

Our approach

Preprocessing

Identify groups of words

Convert each group of word into meaningfull information

Demo

Try it yourself

Going further

Run our code yourself

Install the Ephesus package

Run the API locally

Run the API in a Docker container in the cloud

Train the models yourself

Training data

Train the NER with spaCy

Create folders

Download config

Fill config

Create data sets

Train

Evaluate

Train the RNN with nltk

Check other objects

You're done

About

Releases

Packages

Contributors 3

Languages

GeoffroyGit/ephesus

Folders and files

Latest commit

History

Repository files navigation

Project Ephesus

What it's all about

Dataset

Example

Our approach

Preprocessing

Identify groups of words

Convert each group of word into meaningfull information

Demo

Try it yourself

Going further

Run our code yourself

Install the Ephesus package

Run the API locally

Run the API in a Docker container in the cloud

Train the models yourself

Training data

Train the NER with spaCy

Create folders

Download config

Fill config

Create data sets

Train

Evaluate

Train the RNN with nltk

Check other objects

You're done

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages