-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TermSet Integration #862
Closed
TermSet Integration #862
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
1c3344b
first
mavaylon1 e291adc
tests
mavaylon1 e2f36ff
linkml
mavaylon1 259c16e
path
mavaylon1 adfbf01
validation tests
mavaylon1 b707c74
validation for DT
mavaylon1 2dbfdf9
tests df validate
mavaylon1 5665f7e
er tests for termset
mavaylon1 da986e8
yaml duplicate for testing
mavaylon1 555d5fc
typo
mavaylon1 527dd8d
req
mavaylon1 3df45a0
linkml
mavaylon1 0cbc32f
paths
mavaylon1 8cbab52
tutorials
mavaylon1 ae50c9c
update
mavaylon1 590cbda
Merge branch 'dev' into termset
rly f0fc10e
path
mavaylon1 154234c
path
mavaylon1 b8eca63
path
mavaylon1 e841d6e
Update plot_term_set.py
mavaylon1 353d0ed
Update plot_external_resources.py
mavaylon1 426cf41
path gallery
mavaylon1 d1668be
Update requirements-min.txt
mavaylon1 fc54686
Update pyproject.toml
mavaylon1 15bcfdb
Update requirements-min.txt
mavaylon1 09ebc8c
Update pyproject.toml
mavaylon1 f5ec77a
Update requirements-min.txt
mavaylon1 55dda2a
Update requirements-dev.txt
mavaylon1 e6d2358
Update requirements-min.txt
mavaylon1 0dff59c
Update requirements.txt
mavaylon1 26b8d8c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 51e6a37
Update pyproject.toml
mavaylon1 c3d500b
Update requirements-dev.txt
mavaylon1 8813e0b
Update tox.ini
mavaylon1 ec42383
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d1e1919
Update tox.ini
mavaylon1 d5de41b
Update environment-ros3.yml
mavaylon1 b5d6f8a
Update environment-ros3.yml
mavaylon1 6d6eb3a
Update tests/unit/test_term_set.py
mavaylon1 ed273e3
Update tests/unit/test_container.py
mavaylon1 22ffdc0
unit test skips
mavaylon1 a088aad
undo changes to install
mavaylon1 ef43b23
updates
mavaylon1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
id: pynert/termset/species_example | ||
name: Species | ||
prefixes: | ||
NCBI_TAXON: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id= | ||
Ensemble: https://rest.ensembl.org/taxonomy/id/ | ||
imports: | ||
- linkml:types | ||
default_range: string | ||
|
||
enums: | ||
Species: | ||
permissible_values: | ||
Homo sapiens: | ||
description: description | ||
meaning: NCBI_TAXON:9606 | ||
Mus musculus: | ||
description: description | ||
meaning: Ensemble:10090 | ||
Ursus arctos horribilis: | ||
description: description | ||
meaning: NCBI_TAXON:116960 | ||
Myrmecophaga tridactyla: | ||
description: description | ||
meaning: NCBI_TAXON:71006 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
""" | ||
TermSet | ||
======= | ||
|
||
This is a user guide for interacting with the | ||
:py:class:`~hdmf.TermSet` class. The TermSet type | ||
is experimental and is subject to change in future releases. If you use this type, | ||
please provide feedback to the HDMF team so that we can improve the structure and | ||
overall capabilities. | ||
|
||
Introduction | ||
------------- | ||
The :py:class:`~hdmf.TermSet` class provides a way for users to create their own | ||
set of terms from brain atlases, species taxonomies, and anatomical, cell, and | ||
gene function ontologies. | ||
|
||
:py:class:`~hdmf.TermSet` serves two purposes: data validation and external reference | ||
management. Users will be able to validate their data to their own set of terms, ensuring | ||
clean data to be used inline with the FAIR principles later on. | ||
The :py:class:`~hdmf.TermSet` class allows for a reusable and sharable | ||
pool of metadata to serve as references to any dataset within the NWB ecosystem. | ||
The :py:class:`~hdmf.TermSet` class is used closely with | ||
:py:class:`~hdmf.common.resources.ExternalResources` to more efficiently map terms | ||
to data. Please refer to the tutorial on ExternalResources to see how :py:class:`~hdmf.TermSet` | ||
is used with :py:class:`~hdmf.common.resources.ExternalResources`. | ||
|
||
:py:class:`~hdmf.TermSet` is built upon the resources from LinkmL, a modeling | ||
language to create YAML schemas, giving :py:class:`~hdmf.TermSet` | ||
a standardized structure and a variety of tools to help the user manage their references. | ||
|
||
How to make a TermSet Schema | ||
---------------------------- | ||
Before the user can take advantage of all the wonders within the | ||
:py:class:`~hdmf.TermSet` class, the user needs to a LinkML schema (YAML) that provides | ||
all the permissible term values. Please refer to https://linkml.io/linkml/intro/tutorial06.html | ||
to learn more about how LinkML structures their schemas. | ||
|
||
1. The name of the schema is up to the user, e.g., the name could be "Species" if the term set will | ||
contain species terms. | ||
2. The prefixes will be the standardized prefix of your source, followed by the URI to the terms. | ||
For example, the NCBI Taxonomy is abbreviated as NCBI_TAXON, and Ensemble is simply Ensemble. | ||
As mentioned prior, the URI needs to be to the terms; this is to allow the URI to later be coupled | ||
with the source id for the term to create a valid link to the term source page. In the case of | ||
Ensemble, it would be "https://rest.ensembl.org/taxonomy/id/". | ||
3. The schema uses LinkML enumerations to list all the possible terms. Currently, users will need to | ||
manually outline the terms within the enumeration's permissible values. | ||
|
||
For a clear example, please refer to example_term_set.yaml within the tutorial gallery. | ||
""" | ||
###################################################### | ||
# Creating an instance of the TermSet class | ||
# ---------------------------------------------------- | ||
from hdmf.common import ExternalResources, DynamicTable, VectorData | ||
try: | ||
import linkml_runtime # noqa: F401 | ||
LINKML_INSTALLED = True | ||
except ImportError: | ||
LINKML_INSTALLED = False | ||
|
||
if LINKML_INSTALLED: | ||
from hdmf.term_set import TermSet | ||
|
||
###################################################### | ||
# Viewing TermSet values | ||
# ---------------------------------------------------- | ||
# :py:class:`~hdmf.TermSet` has methods to retrieve terms. The :py:func:`~hdmf.TermSet:view_set` | ||
# method will return a dictionary of all the terms and the corresponding information for each term. | ||
# Users can index specific terms from the :py:class:`~hdmf.TermSet`. | ||
if LINKML_INSTALLED: | ||
terms = TermSet(name='Species', term_schema_path='docs/gallery/example_term_set.yaml') | ||
terms.view_set | ||
|
||
# Retrieve a specific term | ||
terms['Homo sapiens'] | ||
|
||
###################################################### | ||
# Validate Data with TermSet | ||
# ---------------------------------------------------- | ||
# :py:class:`~hdmf.TermSet` has been integrated so that :py:class:`~hdmf.Data` and its | ||
# subclasses support a term_set attribute. By having this attribute set, the data will be validated | ||
# and all new data will be validated. | ||
if LINKML_INSTALLED: | ||
data = VectorData( | ||
name='species', | ||
description='...', | ||
data=['Homo sapiens'], | ||
term_set=terms) | ||
|
||
###################################################### | ||
# Validate on append with TermSet | ||
# ---------------------------------------------------- | ||
# As mentioned prior, when the term_set attribute is set all new data is validated. This true for both | ||
# append and extend methods. | ||
if LINKML_INSTALLED: | ||
data.append('Ursus arctos horribilis') | ||
data.extend(['Mus musculus', 'Myrmecophaga tridactyla']) | ||
|
||
###################################################### | ||
# Validate Data in a DynamicTable with TermSet | ||
# ---------------------------------------------------- | ||
# Validating data with :py:class:`~hdmf.common.table.DynamicTable` is determined by which columns were | ||
# initialized with the term_set attribute set. The data is validated when the columns are created and not | ||
# when set as columns to the table. | ||
if LINKML_INSTALLED: | ||
col1 = VectorData( | ||
name='Species_1', | ||
description='...', | ||
data=['Homo sapiens'], | ||
term_set=terms, | ||
) | ||
col2 = VectorData( | ||
name='Species_2', | ||
description='...', | ||
data=['Mus musculus'], | ||
term_set=terms, | ||
) | ||
species = DynamicTable(name='species', description='My species', columns=[col1,col2]) | ||
|
||
###################################################### | ||
# Validate new rows in a DynamicTable with TermSet | ||
# ---------------------------------------------------- | ||
# Validating new rows to :py:class:`~hdmf.common.table.DynamicTable` is simple. The | ||
# :py:func:`~hdmf.common.table.DynamicTable.add_row` method will automatically check each column for a | ||
# :py:class:`~hdmf.TermSet` (via the term_set attribute). If the attribute is set, the the data will be | ||
# validated for that column using that column's :py:class:`~hdmf.TermSet`. If their is invalid data, the | ||
# row will not be added and the user will be prompted to fix the new data in order to populate the table. | ||
if LINKML_INSTALLED: | ||
species.add_row(Species_1='Mus musculus', Species_2='Mus musculus') | ||
|
||
###################################################### | ||
# Validate new columns in a DynamicTable with TermSet | ||
# ---------------------------------------------------- | ||
# As mentioned prior, validating in a :py:class:`~hdmf.common.table.DynamicTable` is determined | ||
# by the columns. The :py:func:`~hdmf.common.table.DynamicTable.add_column` method has a term_set attribute | ||
# as if you were making a new instance of :py:class:`~hdmf.common.table.VectorData`. When set, this attribute | ||
# will be used to validate the data. The column will not be added if there is invalid data. | ||
if LINKML_INSTALLED: | ||
col1 = VectorData( | ||
name='Species_1', | ||
description='...', | ||
data=['Homo sapiens'], | ||
term_set=terms, | ||
) | ||
species = DynamicTable(name='species', description='My species', columns=[col1]) | ||
species.add_column(name='Species_2', | ||
description='Species data', | ||
data=['Mus musculus'], | ||
term_set=terms) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this file is to be used when testing whether the minimum requirements set in
pyproject.toml
are valid. Using >= instead of == defeats the point.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 2.6.0 is too small, then the minimum version should be increased both here and in
pyproject.toml