Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Requested feature] Usage of different ontology versions during validation #50

Open
M-casado opened this issue Oct 26, 2022 · 7 comments

Comments

@M-casado
Copy link
Contributor

Summary

A feature to be able to use ontology versions on demand for term validation.

Motivation and details

For the sake of traceability it's a must to store the version that was used of each ontology during validation. Nevertheless, knowing which version of the ontology was used is only partly useful if that version cannot be used when trying to validate the metadata again. Therefore, a feature to use ontologies' versions is required.

Inspired by Phenopacket's approach (see resources at MetaData object of their schemas), EGA new schemas specify in a similar fashion the version of each ontology used in a submission (see lines of code): a single object (submission) that has an array of used ontologies, each with their respective versions. This is restrictive in the sense that only one version of each ontology can be used per submission, but that is the expected use-case. Saving the ontology version at each individual ontology term seems overwhelming and unnecessary.

Following this approach, the requested feature would include a parser that would detect (either by a file, reference, bespoke structure. of part of the JSONs..) automatically which ontology was used for each submission and, if not found, to use the latest version available (current behaviour). This puts a heavy constraint, which is the fact that objects may be dependant on other objects being validated at the same time. We can discuss how this could be done in the best manner, or if it would be better to record each version at each ontology use, etc.

Use cases

Example 1

I submitted metadata to EGA 3 months ago, and it was valid at that time. Now this metadata is going to be shared across different institutions, with a validation step in the middle. The ontology I used changed and now my metadata is no longer valid against the standards. Being able to specify which version of the ontology I used would allow me to pass validation according to the time my submission was done.

@theisuru
Copy link
Collaborator

theisuru commented Dec 2, 2022

@M-casado I have no knowledge of versioning of ontologies. Does OLS support versioning of ontologies?

@M-casado
Copy link
Contributor Author

M-casado commented Dec 5, 2022

I have not tried it myself, but I would expect OLS to have a way to specify the version of the ontologies. With a quick glance at their API documentation I saw version and versionIri, which at the very least indicate the used version.
image

@M-casado
Copy link
Contributor Author

M-casado commented Feb 28, 2024

@theisuru - Just checking on this issue. We at EGA are developing our infrastructure to archive metadata, and depending on how Biovalidator deals with ontology versions, we may choose a path or another.

For example, if we agree to have the ontology version at each ontology term being validated, it would look something like:

{
    "ontology": {
          "id": "UBERON:0000955",
          "version": "1.2.0"
        }
}

This would be a bit tedious to fill out by users, but we could ideally have an option to fill out that "version" term automatically with the latest version found in OLS if not given. Besides, this option is the most informative and unambiguous, and would make it easy for Biovalidator to validate each term with the specific ontology version, since they would be under the same term.

If, on the other hand, we decide to save the versions for a whole "submission" in a different object, we may do something like the following:

        "resources": [
            {
                "automaticallyAssigned": false,
                "name": "Human Phenotype Ontology",
                "namespacePrefix": "HP",
                "version": "2022-06-11"
            },
            {
                "automaticallyAssigned": false,
                "name": "Experimental Factor Ontology",
                "namespacePrefix": "EFO",
                "version": "3.45.0"
            }
        ],

This is what I initially had for our JSON Schemas (see lines here), but I believe it would be more difficult for Biovalidator to pick the right versions for validation from a different object (?).

@M-casado
Copy link
Contributor Author

M-casado commented Dec 4, 2024

@theisuru - Are there any advances on this? I think it should be a requirement in itself to keep track of the version of each resource integrated through APIs.

For example:

  • If I'm using identifiers.org, it's a requirement for me to keep track of the version of their API/registry.
  • If I'm using OLS, it's a requirement for me to keep track of the version of the API/ontology I used.
  • etc.

Without this, I think it's almost impossible to keep track of the validation standards, which erodes any possibility of backwards compatibility of the model. Imagine: as soon as an ontology changes in OLS, I won't be able to re-validate whatever I had validated before the change.

@theisuru
Copy link
Collaborator

@M-casado we will be able to address this issue soon. I will keep you updated here.

@M-casado
Copy link
Contributor Author

That sounds great, thank you, Isuru. 👏

Whatever the solution may be, summarising my requirements could be simply "to have backwards compatibility with embedded resources where possible". Mainly ontologies, but not only, since APIs may be versioned as well (e.g., identifiers.org).

@M-casado
Copy link
Contributor Author

I envision the way to solve it would be to keep track of the used ontology versions and/or API versions. Either in a JSON config file, or in some other selected JSON metadata file (e.g., it may be an "overarching" submission JSON file displaying versions of the used resources).

Ideally this information is automatically spat by Biovalidator upon validation if missing (e.g., "your data is valid, and these are the versions we used...") but also be taken as an input if available (e.g., "okay so to validate your data you need to use these specific versions? Let's use them...")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants