Meeting Minutes 24th March 2020

Participants:

Peter Murray Rust (PMR)
Matthew Dunstan (MD)
Ben Smith (BS)

Minutes:

Initial idea Scrape a volume of the literature, extract text and analyse it
e.g. Find 100-500 papers on battery material NMC811, extract Voltage/Capacity curves, focus on a small number of journals.

How do we work?

Open notebook - PMR preference is Github Open a new open repository for the project

Possibly use Google Colab to share and run code.

Use Github issues as a task manager.

Run through Docker or Jupyter Use JDK and Maven

Other information

There is a good tutorial for similar projects PMR was involved in:
https://github.com/petermr/tigr2ess

Initially looked for articles in Chem Arxiv (461 results) and Pubmed Central (3000 results)

Coding Skills in team:

Jupyter and Colab - MD and BS
Python (pymatgen, pandas, scikit-learn) - MD and BS
Castep - MD
R/Java - PMR - If we are going to do text mining, R package might be best
pyplot

Programs developed by PMR:

Oscar - chemical names in text
Opsym - changes text names into structures
Chemical-Tagger - interprets recipes

**Actions (Also in Issues): **

Github - give Peter the link - Done
Put minutes of the meeting in Github - Done
PMR will try and run 3000 OA papers locally to see what can be extracted
BS and MTD to come up with list of terms - for search - from what we want to search for, and for what the literature thinks is important, what Wikipedia has in terms of categories and templates
MTD to look for dictionaries for battery terms that already exist (Olavetti et al.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meeting Minutes 24th March 2020

Clone this wiki locally