Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: simplify API #42

Open
wants to merge 37 commits into
base: develop
Choose a base branch
from

Conversation

kikkomep
Copy link
Member

This PR simplifies the API for programmatic validation, enhances the code with additional documentation for clarity, and includes an initial example of programmatic usage in the README to help users get started.

(partially addresses issue #40)


```bash
RO-Crate is invalid!
Detected issue of severity REQUIRED with check "ro-crate-1.1:root_entity_exists: The RO-Crate must contain a root entity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output sometimes changes when running the code multiple times. The list returned by result.get_issues contains only one element, and it may change in subsequent runs. I was expecting get_issues to return a list of all issues, instead it looks like only one is picked randomly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably has to do with whether fast failure is enabled

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we switch to disabling fast failure by default? I've noticed that this is a pretty common point of confusion for new users (the CLI can also show different errors on different runs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably has to do with whether fast failure is enabled

Yes, it depends on the enabled fail-fast mode and, more specifically, on the underlying SHACL engine.

Could we switch to disabling fast failure by default? I've noticed that this is a pretty common point of confusion for new users (the CLI can also show different errors on different runs).

Sure, we can consider doing that, but perhaps in a separate PR.

# Set the path to the RO-Crate root directory
rocrate_uri='../tests/data/crates/invalid/2_root_data_entity_metadata/missing_root_entity',
# Set the identifier of the RO-Crate profile to use for validation
profile_identifier='ro-crate-1.1',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't this autodetected if unset?

Copy link
Member Author

@kikkomep kikkomep Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, it was autodetected if unset.
I’ve added a comment about this in 29ca0ed

README.md Outdated
# Create an instance of `ValidationSettings` class to configure the validation
settings = services.ValidationSettings(
# Set the path to the RO-Crate root directory
rocrate_uri='../tests/data/crates/invalid/2_root_data_entity_metadata/missing_root_entity',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is a real path, but for documentation it might be clearer to have something simpler (e.g., /path/to/rocrate)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in bee0e3b


```bash
RO-Crate is invalid!
Detected issue of severity REQUIRED with check "ro-crate-1.1:root_entity_exists: The RO-Crate must contain a root entity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably has to do with whether fast failure is enabled

@elichad
Copy link

elichad commented Nov 21, 2024

I haven't looked through all the code changes, but I've tried this out using the docs in this PR and it's a nice improvement!

I did find that when I got the example code to output all the errors (by disabling fast failure by setting abort_on_first=False in the ValidationSettings), some of the errors are duplicated, and some raw SHACL errors are surfaced:

$ python validate_crate.py 
RO-Crate is invalid!
Detected issue of severity REQUIRED with check "ro-crate-1.1_5.2": The RO-Crate metadata file descriptor MUST have an `about` property referencing the Root Data Entity
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_10.1": The Root Data Entity URI MUST end with `/`
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//README.md>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow-test.yml>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow.ga>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//test-data/>->[ sh:inversePath <http://schema.org/hasPart> ]

This is exacerbated if I choose a profile that inherits from the base profile, e.g. with workflow-ro-crate:

$ python validate_crate.py 
RO-Crate is invalid!
Detected issue of severity REQUIRED with check "workflow-ro-crate-1.0_4.1": Unable to check the existence of the main workflow file because the metadata file descriptor doesn't contain a `mainEntity`
Detected issue of severity REQUIRED with check "ro-crate-1.1_5.2": The RO-Crate metadata file descriptor MUST have an `about` property referencing the Root Data Entity
Detected issue of severity REQUIRED with check "ro-crate-1.1_5.2": The RO-Crate metadata file descriptor MUST have an `about` property referencing the Root Data Entity
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_10.1": The Root Data Entity URI MUST end with `/`
Detected issue of severity REQUIRED with check "ro-crate-1.1_10.1": The Root Data Entity URI MUST end with `/`
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//README.md>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//README.md>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow-test.yml>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow-test.yml>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow.ga>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow.ga>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//test-data/>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//test-data/>->[ sh:inversePath <http://schema.org/hasPart> ]

There might be some extra code needed to filter out duplicates as part of get_issues(). (Feel free to make this a separate issue if you prefer.)

@kikkomep
Copy link
Member Author

I haven't looked through all the code changes, but I've tried this out using the docs in this PR and it's a nice improvement!

I did find that when I got the example code to output all the errors (by disabling fast failure by setting abort_on_first=False in the ValidationSettings), some of the errors are duplicated, and some raw SHACL errors are surfaced:

$ python validate_crate.py 
RO-Crate is invalid!
Detected issue of severity REQUIRED with check "ro-crate-1.1_5.2": The RO-Crate metadata file descriptor MUST have an `about` property referencing the Root Data Entity
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_10.1": The Root Data Entity URI MUST end with `/`
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//README.md>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow-test.yml>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow.ga>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//test-data/>->[ sh:inversePath <http://schema.org/hasPart> ]

This is exacerbated if I choose a profile that inherits from the base profile, e.g. with workflow-ro-crate:

$ python validate_crate.py 
RO-Crate is invalid!
Detected issue of severity REQUIRED with check "workflow-ro-crate-1.0_4.1": Unable to check the existence of the main workflow file because the metadata file descriptor doesn't contain a `mainEntity`
Detected issue of severity REQUIRED with check "ro-crate-1.1_5.2": The RO-Crate metadata file descriptor MUST have an `about` property referencing the Root Data Entity
Detected issue of severity REQUIRED with check "ro-crate-1.1_5.2": The RO-Crate metadata file descriptor MUST have an `about` property referencing the Root Data Entity
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_9.1": The Root Data Entity MUST be a `Dataset` (as per `schema.org`)
Detected issue of severity REQUIRED with check "ro-crate-1.1_10.1": The Root Data Entity URI MUST end with `/`
Detected issue of severity REQUIRED with check "ro-crate-1.1_10.1": The Root Data Entity URI MUST end with `/`
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//README.md>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//README.md>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow-test.yml>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow-test.yml>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow.ga>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//my-workflow.ga>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//test-data/>->[ sh:inversePath <http://schema.org/hasPart> ]
Detected issue of severity REQUIRED with check "ro-crate-1.1_13.1": Less than 1 values on <file://.//test-data/>->[ sh:inversePath <http://schema.org/hasPart> ]

There might be some extra code needed to filter out duplicates as part of get_issues(). (Feel free to make this a separate issue if you prefer.)

fixed in 9983589

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants