Skip to content

Repository of proposed and combined MIxS + MInAS standardised metadata checklists using LinkML

Notifications You must be signed in to change notification settings

MIxS-MInAS/MInAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MInAS

This repository holds the combined base MIxS schema, plus the various extensions generated in the scope of the MInAS project.

The source YAML file is generated using the yq tool, merging each of the individual YAML files into a single one.

This merging happens once per extension release, and the resulting file is then used to generate the JSON schema and TSV using LinkML tools and scripts.

The resulting output files are stored in the src/ directory in file format specific subdirectories.

Schema Generation Instructions

Required tools and dependencies

The following tools are required to generate the schema files:

  • yq (version 4.44.2)
    • Note: version not on conda or pip requires binary or OS distribution installation
  • linkml (Version 1.8.1)
    • Available on pip: pip install linkml==1.8.1

Merging the YAML files

To generate the combined YAML file, we can use a combination of yq and curl to download specific tagged releases from the MIxS and various MInAS repositories.

For updating during development:

MIXS_VERSION=6.2.0
# EXTANCIENT_VERSION=0.3.2 ## Only used ofr releases
# EXTRADIOCARBONDATING_VERSION=0.1.2 ## Only used ofr releases

## Core MIxS Schema
curl -o src/mixs/schema/mixs-v$MIXS_VERSION.yaml "https://raw.githubusercontent.com/GenomicsStandardsConsortium/mixs/v$MIXS_VERSION/src/mixs/schema/mixs.yaml" ## Base MIxS schema

## MInAS Extensions
curl -o src/mixs/schema/ancient-main.yaml "https://raw.githubusercontent.com/MIxS-MInAS/extension-ancient/refs/heads/main/src/mixs/schema/ancient.yml" ## Ancient DNA extension
curl -o src/mixs/schema/radiocarbon-dating-main.yaml "https://raw.githubusercontent.com/MIxS-MInAS/extension-radiocarbon-dating/refs/heads/main/src/mixs/schema/radiocarbon-dating.yml" ## Radiocarbon extension

## MInAS Combinations
curl -o src/mixs/schema/minas-combinations-main.yaml "https://raw.githubusercontent.com/MIxS-MInAS/minas-combinations/main/src/mixs/schema/minas-combinations.yaml" ## Combinations

## Merge together. Note you need a select(fileIndex == X) for each yaml file!
yq eval-all 'select(fileIndex == 0) *+ select(fileIndex == 1) *+ select(fileIndex == 2) *+ select(fileIndex == 3)' \
  src/mixs/schema/mixs-v$MIXS_VERSION.yaml \
  src/mixs/schema/ancient-main.yaml \
  src/mixs/schema/radiocarbon-dating-main.yaml \
  src/mixs/schema/minas-combinations-main.yaml \
  > src/mixs/schema/mixs-minas.yaml

## Fix some metadata
sed -i 's#source: https://github.com/MIxS-MInAS/extension-radiocarbon-dating/raw/main/proposals/0.1.0/extension-radiocarbon-dating-v0_1_0.csv#source: https://github.com/MIxS-MInAS/MInAS/#g' src/mixs/schema/mixs-minas.yaml

And then, for a release, (making sure updating the versions in the variables):

MIXS_VERSION=6.2.0
EXTANCIENT_VERSION=0.3.2
EXTRADIOCARBONDATING_VERSION=0.1.2

## Core MIxS Schema
curl -o src/mixs/schema/mixs-v$MIXS_VERSION.yaml "https://raw.githubusercontent.com/GenomicsStandardsConsortium/mixs/v$MIXS_VERSION/src/mixs/schema/mixs.yaml" ## Base MIxS schema

## MInAS Extensions
curl -o src/mixs/schema/ancient-v$EXTANCIENT_VERSION.yaml "https://raw.githubusercontent.com/MIxS-MInAS/extension-ancient/v$EXTANCIENT_VERSION/src/mixs/schema/ancient.yml" ## Ancient DNA extension
curl -o src/mixs/schema/radiocarbon-dating-v$EXTRADIOCARBONDATING_VERSION.yaml "https://raw.githubusercontent.com/MIxS-MInAS/extension-radiocarbon-dating/v$EXTRADIOCARBONDATING_VERSION/src/mixs/schema/radiocarbon-dating.yml" ## Radiocarbon extension

## MInAS Combinations
curl -o src/mixs/schema/minas-combinations-vmain.yaml "https://raw.githubusercontent.com/MIxS-MInAS/minas-combinations/main/src/mixs/schema/minas-combinations.yaml" ## Combinations

## Merge together. Note you need a select(fileIndex == X) for each yaml file!
yq eval-all 'select(fileIndex == 0) *+ select(fileIndex == 1) *+ select(fileIndex == 2) *+ select(fileIndex == 3)' \
  src/mixs/schema/mixs-v$MIXS_VERSION.yaml \
  src/mixs/schema/ancient-v$EXTANCIENT_VERSION.yaml \
  src/mixs/schema/radiocarbon-dating-v$EXTRADIOCARBONDATING_VERSION.yaml \
  src/mixs/schema/minas-combinations-vmain.yaml \
  > src/mixs/schema/mixs-minas.yaml

## Fix some metadata
sed -i 's#source: https://github.com/MIxS-MInAS/extension-radiocarbon-dating/raw/main/proposals/0.1.0/extension-radiocarbon-dating-v0_1_0.csv#source: https://github.com/MIxS-MInAS/MInAS/#g' src/mixs/schema/mixs-minas.yaml

In both cases, we lint and validate the newly extended MIxS schema

linkml lint --validate src/mixs/schema/mixs-minas.yaml

We can then use the LinkML package's gen-json-schema to generate the JSON schema:

gen-json-schema src/mixs/schema/mixs-minas.yaml > src/mixs/schema/mixs-minas.json

And the python3 script in the scripts/ directory to generate the TSV files:

python3 ./scripts/linkml2class_tsvs.py --schema-file src/mixs/schema/mixs-minas.yaml --output-dir project/class-model-tsvs/

Note

This script has been copied and modified very slightly to include the python3 shebang, and is placed under scripts until properly packaged for the MIxS project.

To use this script, you only need python3 and no other dependencies (it seems).

About

Repository of proposed and combined MIxS + MInAS standardised metadata checklists using LinkML

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages