Skip to content

Meeting Minutes 24th March 2020

mtdunstan edited this page Mar 24, 2020 · 1 revision

Participants:

Peter Murray Rust (PMR)
Matthew Dunstan (MD)
Ben Smith (BS)

Minutes:

Initial idea Scrape a volume of the literature, extract text and analyse it
e.g. Find 100-500 papers on battery material NMC811, extract Voltage/Capacity curves, focus on a small number of journals.

How do we work?

Open notebook - PMR preference is Github Open a new open repository for the project

Possibly use Google Colab to share and run code.

Use Github issues as a task manager.

Run through Docker or Jupyter Use JDK and Maven

Other information

There is a good tutorial for similar projects PMR was involved in:
https://github.com/petermr/tigr2ess

Initially looked for articles in Chem Arxiv (461 results) and Pubmed Central (3000 results)

Coding Skills in team:

  • Jupyter and Colab - MD and BS
  • Python (pymatgen, pandas, scikit-learn) - MD and BS
  • Castep - MD
  • R/Java - PMR - If we are going to do text mining, R package might be best
  • pyplot

Programs developed by PMR:

  • Oscar - chemical names in text
  • Opsym - changes text names into structures
  • Chemical-Tagger - interprets recipes

**Actions (Also in Issues): **

  • Github - give Peter the link - Done
  • Put minutes of the meeting in Github - Done
  • PMR will try and run 3000 OA papers locally to see what can be extracted
  • BS and MTD to come up with list of terms - for search - from what we want to search for, and for what the literature thinks is important, what Wikipedia has in terms of categories and templates
  • MTD to look for dictionaries for battery terms that already exist (Olavetti et al.)
Clone this wiki locally