-
Notifications
You must be signed in to change notification settings - Fork 0
Meeting Minutes 24th March 2020
Participants:
Peter Murray Rust (PMR)
Matthew Dunstan (MD)
Ben Smith (BS)
Minutes:
Initial idea
Scrape a volume of the literature, extract text and analyse it
e.g. Find 100-500 papers on battery material NMC811, extract Voltage/Capacity curves, focus on a small number of journals.
How do we work?
Open notebook - PMR preference is Github Open a new open repository for the project
Possibly use Google Colab to share and run code.
Use Github issues as a task manager.
Run through Docker or Jupyter Use JDK and Maven
Other information
There is a good tutorial for similar projects PMR was involved in:
https://github.com/petermr/tigr2ess
Initially looked for articles in Chem Arxiv (461 results) and Pubmed Central (3000 results)
Coding Skills in team:
- Jupyter and Colab - MD and BS
- Python (pymatgen, pandas, scikit-learn) - MD and BS
- Castep - MD
- R/Java - PMR - If we are going to do text mining, R package might be best
- pyplot
Programs developed by PMR:
- Oscar - chemical names in text
- Opsym - changes text names into structures
- Chemical-Tagger - interprets recipes
**Actions (Also in Issues): **
- Github - give Peter the link - Done
- Put minutes of the meeting in Github - Done
- PMR will try and run 3000 OA papers locally to see what can be extracted
- BS and MTD to come up with list of terms - for search - from what we want to search for, and for what the literature thinks is important, what Wikipedia has in terms of categories and templates
- MTD to look for dictionaries for battery terms that already exist (Olavetti et al.)