Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flattend graph and added other changes of consortium meeting nov. 2024 #95

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SteffenBrinckmann
Copy link
Collaborator

No description provided.

@SteffenBrinckmann SteffenBrinckmann self-assigned this Nov 18, 2024
@FlorianRhiem
Copy link
Contributor

Hey @SteffenBrinckmann,
I can import the .eln file with SampleDB, and get one object:
image

Looking into the data, there are Dataset entries in the graph that include variableMeasured entries, such as ./PastasExampleProject/001_ThisIsAnotherExampleTask and its subtask, which are not listed as parts of the root data entity. As a result, my importing code does not consider them to be importable objects, but unsupported supplementary information for ./PastasExampleProject/. I strongly suspect this is not the intention behind these objects, as they seem structured as directories and have the folder custom genre.

How do others interpret/parse/handle Dataset entries that are not direct part of ./?

@SteffenBrinckmann
Copy link
Collaborator Author

As far as I understand, all entries only have the direct children as hasParts. As such, the root data entry only has its direct children as hasParts.

@FlorianRhiem
Copy link
Contributor

Currently our spec states:

Subsequently, all the remaining nodes are assigned a @type of either Dataset for directories or File for individual files. And the @id corresponds to something in the hasPart of ./.

If a Dataset node has additional files, they should be listed in its hasPart property and can be referenced through their @id.

This is also how I've handled it so far, with Dataset nodes that are not part of ./, but of another Dataset node, providing supplementary information (e.g. version info in case of SampleDB) for that Dataset node.

@SteffenBrinckmann
Copy link
Collaborator Author

That spec. statement results necessarily in a flat graph of the top layer "./" and all other nodes being siblings on the second layer.

I was not aware of this limitation and would vote to allow deep graphs. The RO-crate spec has the example of 3 layers: (https://www.researchobject.org/ro-crate/specification/1.1/data-entities.html#referencing-files-and-folders-from-the-root-data-entity but does not go into details of supplemental information)

Alternative path: if we decide on keeping the current spec., then I can flatten the graph that Pasta produces but add some additional key:value-pair that contains the full hierarchy for those ELNs that handle the full graph.

@nicobrandt
Copy link
Contributor

I think we never fully discussed whether we should restrict the potential directory structure in our spec. The RO-Crate spec itself seems to be pretty lax (see the URL @SteffenBrinckmann posted), but most ELNs probably won't be able to import arbitrarily nested structures, or are able to handle the possibility of having either directories or files on the "top-level", etc. Probably worth moving this into a separate issue for further discussion?

@SteffenBrinckmann
Copy link
Collaborator Author

Pause this merge until #98 is settled.
(Didn't we a few days ago agree that we are almost aligned ;-) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants