Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove axis restrictions #235

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Apr 30, 2024

Axes can be N-dimensional, of any type, in any order.

Copy link
Contributor

github-actions bot commented Apr 30, 2024

Automated Review URLs

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome-ngff-update-postponing-transforms-previously-v0-5/95617/3

@jni
Copy link

jni commented May 1, 2024

Thanks for opening the PR @d-v-b! 🙏

but btw it seems you missed a line further down, line 313 in the current file:

Each "datasets" dictionary MUST have the same number of dimensions and MUST NOT have more than 5 dimensions.

@d-v-b
Copy link
Contributor Author

d-v-b commented May 2, 2024

Thanks @jni, I think I got all the references to the 2-5D limit. Please let me know if I missed any.

One consequence of this change is that 1D data can now be stored in OME-NGFF. Personally, I think this is great -- 1D data is real data, and people should be able to store it if they have it.

@bogovicj
Copy link

bogovicj commented May 2, 2024

This PR should update the schema and examples before being merged.
Heads up - I did a lot of work on this front in this PR: #138
(my commits from Dec 2023), take + edit what you can. I can try to help merge things

Edit: after a little more thought, I'm hopeful the schema changes needed here will be small; but certainly some examples that are currently disallowed that we want to allow would be helpful.

@d-v-b d-v-b marked this pull request as draft May 4, 2024 11:54
@d-v-b
Copy link
Contributor Author

d-v-b commented May 4, 2024

switching this to a draft while I work on getting the schema documents consistent with the spec. Because manually editing JSON schema documents is tedious and error prone, I am going to generate the schema with some python scripts containing pydantic models. Personally I think these models should be part of this repo, because pydantic is a good tool for modelling JSON schema (much better than doing it manually), but if this is unconvincing I can remove the python files from the final PR.

@glyg
Copy link
Contributor

glyg commented May 16, 2024

I am going to generate the schema with some python scripts containing pydantic models.

@d-v-b wouldn't a linkML version of your pydantic classes be of more generic use?

---
id: https://ngff.openmicroscopy.org/latest/schemas/image.schema
name: ngff-image
title: OpenMicroscopy New Generation File Formats Image Schema
description: |-
  TODO
version: 0.1
license: ??


prefixes:
  linkml: https://w3id.org/linkml/
  biolink: https://w3id.org/biolink/
  schema: http://schema.org/
  ome: https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome.html#
  ORCID: https://orcid.org/
  wiki: https://en.wikipedia.org/wiki/



classes:
  Axis:
    attributes:
      name:
        required: true
      type:
      unit:

  ScaleTransform:
    attributes:
      type:
        # equals_string: "scale" (set this as a rule?)
      scale:
        range: float
        array:
          maximum_number_dimensions: 1
          dimensions:
            - minimum_caridnality: 1

  TranslationTransform:
    attributes:
      type:
        # equals_string: "translation" (set this as a rule?)
      scale:
        range: float
        array:
          maximum_number_dimensions: 1
          dimensions:
            - minimum_caridnality: 1

...

@d-v-b
Copy link
Contributor Author

d-v-b commented May 16, 2024

@glyg perhaps it would, but the goal here is just to generate JSON schema documents, so I'm not sure what generic use we need to accommodate?

@glyg
Copy link
Contributor

glyg commented May 21, 2024

@d-v-b ­— I surely don't have a broad enough view of the whole project, so I might very well be mistaken

what generic use we need to accommodate?

I was thinking about consumers of the schema or of a zarr.json. linkML seems to me more usable and language agnostic than custom pydantic classes.

For example a third party library wanting to parse the zarr.json could import these schemas to embed them in its own tooling.

@d-v-b
Copy link
Contributor Author

d-v-b commented May 21, 2024

@glyg in terms of scope, currently this repo contains JSON schema documents that can fetched from github and used for validation. I don't think there's any expectation that software libraries import code artifacts by this repo. That could of course change, but I don't know of efforts in that direction.

I am proposing changes to the spec, and so I need to update the schema documents. Because the current JSON schema documents are manually written, they contain mistakes and are a pain to update after making spec changes.

Since this project is already using python as a dependency, as a quality of life change I am proposing to use pydantic to define data models that serialize to JSON schema, as a way to avoid needing to write the schema documents by hand. I could be wrong, but I suspect writing data models in python and serializing those models to JSON schema will be an easier development experience than writing data models in JSON schema directly. Maybe linkml could also work for this purpose, but I don't know how to use linkml, and I do know how to use pydantic, so for me the choice is simple.

@glyg
Copy link
Contributor

glyg commented May 21, 2024

Yes I understand your point, I think a linkml version would bring some value but as you said you are the one doing the work 🙂

Thank you for taking the time to answer me

@d-v-b d-v-b marked this pull request as ready for review May 21, 2024 11:50
@d-v-b
Copy link
Contributor Author

d-v-b commented May 21, 2024

tests are passing, so i think this is ready for review.

name: Optional[str] = None
datasets: conlist(Dataset, min_length=1)
axes: UniqueList[Axis]
coordinateTransformations: Optional[tuple[ScaleTransform] | tuple[ScaleTransform, TranslationTransform]] = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a pydantic tuple equivalent to list in JSON?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON arrays are equivalent to python lists, but the spec defines that coordinateTransformations is typed collection with fixed length, so on the python side it's a union of tuples.

@@ -557,80 +465,6 @@
},
"valid": false
},
{
"formerly": "invalid/invalid_axes_count.json",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of deleting these, it might be worth just flipping valid=true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's definitely easier! I will think about whether these tests are meaningful as positive examples.

@d-v-b
Copy link
Contributor Author

d-v-b commented May 21, 2024

I added a section to provide some guidance for partial implementations, i.e. software that does not implement the full spec; namely, the spec now suggests that partial implementations which normalize input data to their supported subset of the spec notify users when this is occurring.

@jni
Copy link

jni commented May 22, 2024

I added a section to provide some guidance for partial implementations,

imho this recommendation is orthogonal to the main purpose of this PR, and it should come in a separate PR. I like it, but it's an extra thing and it's hard enough to merge PRs that are small and self-contained.

@joshmoore
Copy link
Member

Independently of whether one PR or two, I can certainly see the implementor community wanting clarification in/around RFC-3 about the responsibility placed on them with this restriction dropped.

@d-v-b
Copy link
Contributor Author

d-v-b commented May 22, 2024

imho this recommendation is orthogonal to the main purpose of this PR, and it should come in a separate PR. I like it, but it's an extra thing and it's hard enough to merge PRs that are small and self-contained.

Because this PR is widening the space of ome-ngff data, it seems reasonable to give at least a suggestion for how implementations should handle this change. We cannot expect that all implementations support N-dimensional data. I think the best we can do is suggest that implementations keep users aware of how their data is being cast / coerced / transformed, when that kind of thing is happening. Thus it's very non-orthogonal to this PR.

@jni
Copy link

jni commented May 24, 2024

It's orthogonal in the sense that partial implementations were a thing before this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants