Skip to content

IUPAC Project 2019-031-1-024 FAIR Spectroscopy Data Specification

License

Notifications You must be signed in to change notification settings

BobHanson/IUPAC-FAIRSpec

 
 

Repository files navigation

logo

last updated 2021-09-01

IUPAC-FAIRSpec

Welcome to the GitHub development and demonstration project for the IUPAC Project 2019-031-1-024 Development of a Standard for FAIR Data Management for Spectroscopic Data. Our current working specification can be found as a Google Doc. A demonstration of IUPAC FAIRSpec finding aids and their application is at https://chemapps.stolaf.edu/iupac/demo/demo.htm, with files at https://chemapps.stolaf.edu/iupac/site/ifs.

This GitHub project provides a reference Java implementation of the IUPAC FAIRSpec Standard as a Java library as well as a reference Java implementation of an "IUPAC FAIRSpec data and metadata extractor". It is currently under intensely active development. It is very preliminary and, though public, is only meant for demonstration purposes. Please do not implement these preliminary standards as they are expected to change day by day throughout 2021.

The principal goal of the project is to define standardized metadata associated with complex collections of spectroscopic data in the area of chemistry -- NMR, IR, Raman, MS, etc. The specification is modular and has been worked out primarily in the area of NMR spectroscopy at this time.

It is the IUPAC FAIRSpec Finding Aid that, when represented as JSON (in this case) or XML (leaving that for others for now), along with the extracted collection forms the basis of what we are calling "FAIR Data Management of Spectroscopic Data."

If you just want to get an idea of what the "data extractor" does and not install anything yourself, see the demo at St. Olaf College. It's still rather very crude, but it should give you an idea of what we are about.

Reference Implementation

The code here is an Eclipse Java project. If you want to clone it, feel free. Check it out. Run the test. Even suggest changes. Contribute. Since it is quite a preliminary project, don't get too frustrated if it doesn't work for you. It probably means I have forgotten to mention some aspsect of its implmeentation. Please contact Bob Hanson ([email protected]) if you want some help. We'd like to hear from you.

The reference implementation consists of two main parts -- a Java library of mostly abstract classes that define the basics of the IUPAC FAIRSpec schema, and an imiplemenation of a "data and metadata extractor" that can produce IUPAC FAIRSpec Collections and their associated IUPAC FAIRSpec Finding Aids in JSON format.

The basic demo (src/com/integratedgraphics/ifs/ExtractorTest.java) takes a monolithic ZIP file (30-200MB) provided by authors as supporting information for manuscripts accepted by the Journal of Organic Chemistry and Organic Letters and extracts Digital Objects from it into a Digital Collection. As it does this, it creates in internal Java data model in the form of a an ISFSpecDataFindingAid. When it is done, it serializes this finding aid and writes it to a file.

The Java test class is src/main/java/com/integratedgraphics/ifs/ExtractorTest.java. The extractor test reads one or more "extraction scripts" from /extract/ subdirectories and uses those to parse a Figshare zip file that was deposited by the American Chemical Society as part of their FAIR Data initiative.

As it parses the extraction script, it:

  1. opens one or more Figshare ZIP files
  2. extracts Digital Objects into an "IFS FAIR Data Collection" in the site/ifs directory (not present here because of .gitignore)
  3. builds an IFSSpecDataFindingAid internal representation of the collection
  4. when done, generates a JSON serialization of the IFSSpecDataFindingAid object

Before you run the test, take a look at then test's main() method and adjust the parameters there a bit if you want. They include:

  • first the first test to run (0 to 12)
  • last the last test to run (0 to 12)
  • targetDir leave this as "../site/ifs"
  • sourceDir you can indicate a local source dir to use instead of Figshare to save download time. If you do that, you need to save the figshare nnnnnnnn.zip there.

There are several other flags that can be set. The demo is not set up for batch command-line operation, and it is not built as a JAR file. It is simply an Eclipse Java project right now.

After you run the test, the /save/ifs directory will be populated, and the /html/demo.htm file should work. Since this HTML file is going to open files on your local machine, be sure to have your browser set up for local file reading.

About

IUPAC Project 2019-031-1-024 FAIR Spectroscopy Data Specification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 91.1%
  • XSLT 6.5%
  • CSS 1.5%
  • Other 0.9%