Contact person: Erik Tjong Kim Sang [email protected]
Notebooks for scraping websites with medical guidelines and performing text analysis
- Run
scrape_website.ipynb
to retrieve the html files. They will be stored in the directory../data/richtlijnendatabase.nl
- Run
get_paragraphs.ipynb
to extract the paragraphs with text from the downloaded files. They will be stored in the filecsv/paragraphs_20210712.csv
- Run steps 1 and 4 of
text_ranking.ipynb
to find the paragraphs with relevant medical terms regarding ehealth. This information will be stored in the filesparagraphs.json
andindex.html
- Run
json_diff.ipynb
to compare the json file of step 3 with a previous version and classify the html pages according to treatment steps. The results will be stored in the fileindex.html