Pathology NLP

Project TO DO List

Summarize unmatched concepts (similar to matched concepts)
Create views to convert long format to wide
- which features/concepts do you care about?
train deep learning model based on partially matched concepts?
- get list of unmatched concepts
- have Toby/Adrie help with mapping (for training)
- create classifier for unmatched phrases (extraction)
Match based on edit distance?
Create aliases and process to load/match them
- Use DL instead?
literature review
review hand mapped reports
Also create views with parent concepts (Show me histologic type for these reports sql)
Some reports Q: A format per line; others different format. Find some of each
X - Hand map one or two reports
X - Need to process parts from columns 'Parts' and 'Final Diagnosis'; should map parts labels to final diagnosis sections
X - multiple matches for values usually means need to use key to find correct concept
X - key/value pair doesn't work for section headings e.g. "Pathologic Staging (pTNM)"
X - Store matched concepts to database
- first purge matches for given report
- store new concepts
X - Update parser to run over more than just 1 report (probably switch from NB to .py file)

Pathology items:

Fixed - Some reports have “<” encoded as “<”
Fixed - What is "D;" at the beginning of DiagnosticComment colum?
X - Send the CAP document that describes the 7 or so ways to document a CAP report
Start paper introduction w/literature review highlighting key literature
Hand curated reports

Project Setup

This project relies upon Python and uses Anaconda to manage installation and dependencies.

Note: As this project was developed on macOS, some dependencies may be overly strict for other OSes.

conda create -n nlp python=3.8
conda env export -n nlp -f environment.yml

Project Goals and Tasks

Project level tasks:

Identify appropriate College of American Pathologists protocol to be applied to each case
Identify super set of features that will be extracted

Case level tasks:

Identify number of synoptic reports per case
For each synoptic report identified, extract appropriate data elements

Data Structure

Need to create standardized data model + bladder + prostate extensions

Questions:

Hand Extracted Reports? Path will work on this.
How many? More prostates (~1000s) than bladder. Prostate extracts more consistent
Where will solution live?
1. Process to parse reports
2. Extracted structured information
Do we care about Notes? e.g. "Histologic Type (Note B)" Notes are just to help fill out form, don't need to store.
If differences between PDF and XML, PDF is winner as that's what people fill out.

The diagnosticcomment, finaldiagnosis, or microscopicdescription fields may contain synoptic reports

For Paper

Need to report micro- and macro-averaged precision, recall, and F score.
Calculate 95% confidence intervals by bootstrapping from test set

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
python		python
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pathology NLP

Project TO DO List

Project Setup

Project Goals and Tasks

Data Structure

For Paper

Information Extraction References

About

Releases

Packages

Languages

License

cornish/pathology-nlp

Folders and files

Latest commit

History

Repository files navigation

Pathology NLP

Project TO DO List

Project Setup

Project Goals and Tasks

Data Structure

For Paper

Information Extraction References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages