-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add WRROC models, validators & unit tests #20
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request introduces significant improvements to the WRROC models, validators, and unit tests in the CrateGen project. The changes focus on enhancing code structure, improving validation, and expanding test coverage. Key modifications include refactoring WRROC models to use inheritance, implementing stricter validation rules, enhancing error collection during validation, and adding comprehensive unit tests for various scenarios. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Karanjot786 - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 2 issues found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
tests/unit/test_wrroc_models.py
Outdated
""" | ||
data = { | ||
"id": "provenance-id", | ||
"name": "Test Provenance", | ||
"provenanceData": "Provenance information" | ||
} | ||
model = validate_wrroc(data) | ||
self.assertIsInstance(model, WRROCProvenance) | ||
|
||
def test_validate_wrroc_invalid(self): | ||
""" | ||
Test that validate_wrroc raises a ValueError for invalid WRROC data. | ||
""" | ||
data = { | ||
"unknown_field": "unexpected" | ||
} | ||
with self.assertRaises(ValueError): | ||
validate_wrroc(data) | ||
|
||
def test_validate_wrroc_tes(self): | ||
""" | ||
Test that validate_wrroc_tes correctly validates a WRROC entity for TES conversion. | ||
""" | ||
data = { | ||
"id": "process-id", | ||
"name": "Test Process", | ||
"object": [{"id": "https://raw.githubusercontent.com/elixir-cloud-aai/CrateGen/main/README.md", "name": "Input 1"}], | ||
"result": [{"id": "https://github.com/elixir-cloud-aai/CrateGen/blob/main/LICENSE", "name": "Output 1"}] | ||
} | ||
model = validate_wrroc_tes(data) | ||
self.assertEqual(model.id, "process-id") | ||
self.assertEqual(model.name, "Test Process") | ||
|
||
def test_validate_wrroc_tes_empty_object_list(self): | ||
""" | ||
Test that validate_wrroc_tes correctly validates a WRROC entity with an empty object list for TES conversion. | ||
""" | ||
data = { | ||
"id": "process-id", | ||
"name": "Test Process", | ||
"object": [], | ||
"result": [{"id": "https://github.com/elixir-cloud-aai/CrateGen/blob/main/LICENSE", "name": "Output 1"}] | ||
} | ||
model = validate_wrroc_tes(data) | ||
self.assertEqual(model.object, []) | ||
|
||
def test_validate_wrroc_tes_missing_fields(self): | ||
""" | ||
Test that validate_wrroc_tes raises a ValueError if required fields for TES conversion are missing. | ||
""" | ||
data = { | ||
"id": "process-id", | ||
"name": "Test Process" | ||
} | ||
with self.assertRaises(ValueError): | ||
validate_wrroc_tes(data) | ||
|
||
def test_validate_wrroc_wes(self): | ||
""" | ||
Test that validate_wrroc_wes correctly validates a WRROC entity for WES conversion. | ||
""" | ||
data = { | ||
"id": "workflow-id", | ||
"name": "Test Workflow", | ||
"workflowType": "CWL", | ||
"workflowVersion": "v1.0", | ||
"result": [{"id": "https://github.com/elixir-cloud-aai/CrateGen/blob/main/LICENSE", "name": "Output 1"}] | ||
} | ||
model = validate_wrroc_wes(data) | ||
self.assertEqual(model.workflowType, "CWL") | ||
self.assertEqual(model.workflowVersion, "v1.0") | ||
|
||
def test_validate_wrroc_wes_invalid_url(self): | ||
""" | ||
Test that validate_wrroc_wes raises a ValueError if a result URL is invalid. | ||
""" | ||
data = { | ||
"id": "workflow-id", | ||
"name": "Test Workflow", | ||
"workflowType": "CWL", | ||
"workflowVersion": "v1.0", | ||
"result": [{"id": "invalid_url", "name": "Output 1"}] | ||
} | ||
with self.assertRaises(ValueError): | ||
validate_wrroc_wes(data) | ||
|
||
|
||
def test_validate_wrroc_wes_missing_fields(self): | ||
""" | ||
Test that validate_wrroc_wes raises a ValueError if required fields for WES conversion are missing. | ||
""" | ||
data = { | ||
"id": "workflow-id", | ||
"name": "Test Workflow" | ||
} | ||
with self.assertRaises(ValueError): | ||
validate_wrroc_wes(data) | ||
|
||
if __name__ == "__main__": | ||
unittest.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add edge case tests for WRROC validators
Consider adding more edge case tests for the WRROC validators, such as testing with minimal valid data, data with all optional fields present, and data with unexpected additional fields. This will help ensure the validators are robust and handle a wide range of inputs correctly.
def test_validate_wrroc_process_minimal(self):
"""Test validate_wrroc with minimal valid WRROCProcess data."""
data = {"id": "min-process-id"}
model = validate_wrroc(data)
self.assertIsInstance(model, WRROCProcess)
def test_validate_wrroc_workflow_all_fields(self):
"""Test validate_wrroc with all fields for WRROCWorkflow."""
data = {
"id": "full-workflow-id",
"name": "Full Workflow",
"workflowType": "CWL",
"workflowVersion": "v1.0",
"description": "A complete workflow",
"inputs": [{"id": "input1", "type": "string"}],
"outputs": [{"id": "output1", "type": "file"}]
}
model = validate_wrroc(data)
self.assertIsInstance(model, WRROCWorkflow)
self.assertEqual(model.description, "A complete workflow")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is a very good suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you found the suggestion helpful! Adding these edge case tests will definitely strengthen the robustness of your validators. If you need any assistance implementing these tests or have any questions about specific scenarios to cover, feel free to ask. I'm here to help!
OverviewThis PR introduces data validation in both the TES and WES converters, ensuring that the input data adheres to the expected format and contains all necessary fields before the conversion process begins. The changes are aimed at improving the robustness and reliability of the conversion process by leveraging Pydantic models for validation. Changes Implemented
Validation Details
Additional Information
Please review the changes and provide feedback. |
crategen/converters/tes_converter.py
Outdated
@@ -1,52 +1,66 @@ | |||
from .abstract_converter import AbstractConverter | |||
from .utils import convert_to_iso8601 | |||
from ..models import TESData, WRROCDataTES | |||
from pydantic import ValidationError | |||
|
|||
class TESConverter(AbstractConverter): | |||
|
|||
def convert_to_wrroc(self, tes_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type information and Google-style docstrings here and elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve added type hints and Google-style docstrings to all methods in both TES and WES converters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remember to add Google-style docstrings to all modules, classes, methods and functions in the future. And add type hints to all methods, functions and global vars. And ideally also to local vars, unless their types is really obvious.
You can resolve this conversation after you've read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the code are these validators called? Shouldn't they be used in the convert_from_wrroc()
methods of the TES and WES converters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented the validators directly in the convert_from_wrroc
methods of both TES and WES converters. This ensures that all data is properly validated before the conversion process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see validate_wrroc_wes()
and validate_wrroc_tes()
anywhere being called in the codebase. Can you please send me a permalink to the exact locations where they are called? Probably I'm just confused...
crategen/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check the models in detail yet, but there are clearly some issues (see comment below in validators). Please address all other issues and double check your models against the TES and WES specs, the WRROC profiles and our conversion table first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve double-checked the models against the TES and WES specifications and the WRROC profiles. The models have been updated accordingly to ensure they are comprehensive and accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think it's a good idea to split them up into several modules. I will still need to check them in detail in another round, because I don't have time to look into this in detail now.
Anyway, you can resolve this conversation when you've read it.
This PR addresses several issues raised in the mentor's feedback, particularly focusing on the 1. Fixed Validation Errors in
|
Co-authored-by: salihuDickson <[email protected]>
@Karanjot786, @uniqueg, I just merged a PR to address a few things, namely;
|
Co-authored-by: salihuDickson <[email protected]>
@Karanjot786, @uniqueg, I just merged a PR, that among other minor fixes, configures lefthook to carry out pre-push checks, making sure code pushed to the repo meets a certain level of quality. Currently, linting and formatting are the only checks. |
Thank you for the updates, @SalihuDickson I appreciate the work separating the data models into individual files; this will help keep the codebase more organized as it grows. The modifications to the TES model to better align with the GA4GH specifications are also great, and having the references directly in the models will be useful. I'll make sure to integrate these changes into my workflow. Also, the configuration of the ruff linter and enabling CI checks on all branches are much appreciated—they’ll help maintain a high standard of code quality in the future. Adding lefthook for pre-push checks is a fantastic step toward ensuring consistent code quality. I'll make sure my code adheres to the new checks before pushing any updates. This will certainly help streamline the review process and reduce back-and-forth on formatting and linting issues. Thanks again for implementing these improvements! I'll continue to align my work with these updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking A LOT cleaner now - thanks a lot @Karanjot786. There are still some issues though, and with the cleaner code I found a few more ones. But I expect it will be only two more rounds of changes, one for the code and one for the models (which I haven't reviewed now but will try to do in the next few days).
.github/workflows/ci.yml
Outdated
- uses: actions/checkout@v2 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very outdated, I think the last version is v4 - or even v5. Please check and change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out the outdated versions in the CI workflow. I’ll update actions/checkout
and actions/setup-python
to their latest versions (v4 or v5) as suggested. I’ll ensure this change is reflected in the .github/workflows/ci.yml
file.
crategen/converters/tes_converter.py
Outdated
Args: | ||
data (dict): The input TES data. | ||
|
||
def convert_to_wrroc(self, tes_data): | ||
Returns: | ||
dict: The converted WRROC data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please annotate types in the signature only. Do not additionally annotate them in the docstring, because there is a chance that the info in the signature and that in the docstring will stray. Better to have a single source of truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I will remove the type annotations from the docstrings in tes_converter.py
and other relevant files. I understand the importance of maintaining a single source of truth with type hints in the function signatures.
crategen/converters/tes_converter.py
Outdated
|
||
# Convert to WRROC | ||
# Convert to WRROC format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also remove this comment. The code is not hard to read at all, and even the error message tells us very expliclity what's going on here.
The same goes for the other comment above and the remaining comments in the WES converter.
No need for comments for just a few lines of straightforward code. Only comment if something really tricky comes, like some bitwise operations or a complicated algo etc. Or if you add a workaround for a bug or vulnerability (then you can add a link to the issue). So for these kind of simple tools, basically almost never.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the guidance regarding comments. I will remove the redundant comments from the TES and WES converter files and ensure that only necessary and meaningful comments are included. I agree that the code is straightforward and self-explanatory.
{"@id": output.url, "name": output.path} for output in data_tes.outputs | ||
], | ||
"startTime": data_tes.creation_time, | ||
"endTime": end_time, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to validate your wrroc_data
before you return it? Or do you do this somewhere else upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I’ll add a validation step for wrroc_data
in the convert_to_wrroc
method of tes_converter.py
to ensure the data conforms to the expected schema before returning it.
crategen/converters/tes_converter.py
Outdated
"inputs": [{"url": obj.id, "path": obj.name} for obj in data_wrroc.object], | ||
"outputs": [{"url": res.id, "path": res.name} for res in data_wrroc.result], | ||
"creation_time": data_wrroc.startTime, | ||
"logs": [{"end_time": data_wrroc.endTime}], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: Make sure that not only the input to the conversion is validated, but also the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged! I will also add validation for the output data in the convert_from_wrroc
method in tes_converter.py
. This will ensure the data is correctly validated after conversion.
crategen/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is empty now, guess you can remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the initial review of the model splits. I will await your detailed review to see if any further changes are needed. Meanwhile, I'll ensure the current organization is as clean and logical as possible.
crategen/models.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think it's a good idea to split them up into several modules. I will still need to check them in detail in another round, because I don't have time to look into this in detail now.
Anyway, you can resolve this conversation when you've read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see validate_wrroc_wes()
and validate_wrroc_tes()
anywhere being called in the codebase. Can you please send me a permalink to the exact locations where they are called? Probably I'm just confused...
lefthook.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure you want/need this file under version control? Is it perhaps sth that is just for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will consider whether the lefthook.yml
file is necessary for version control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @uniqueg, I actually added the lefthook.yml file. It adds the pre-push check to make sure the code meets certain quality standards before it can be pushed. It's under version control because I want it to be applied with every contributor.
crategen/validators.py
Outdated
# Convert '@id' to 'id' for validation purposes | ||
if '@id' in data: | ||
data['id'] = data.pop('@id') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the models are you actually using @id
aliases now? I didn't look at the models in depth, but I didn't see them (or any import of Fields
or the use of alias
). Rather, it looks like the models still just use id
? Again, maybe I'm confused, so a pointer would help.
Hi @uniqueg,
These functions are primarily used in the test suite to ensure that the data conforms to the expected structures for TES and WES conversions. |
Hi @uniqueg , Thank you for pointing this out. I have reviewed the current state of our models and codebase. The code snippet related to converting Currently, the models do not explicitly use aliases such as If there's a need to introduce field aliases using Pydantic’s Please let me know if you would like me to implement aliases or if any other changes are required! |
Thanks a lot! I hadn't checked the test files. Generally, if code is only used in tests, it should not be in the main codebase - rather you want to keep it in the test folder. And consider creating a Here though, I think the validator code is useful and should be used in the main code base. You should never return unvalidated code. And you should always validate inputs. So consider where in the codebase you wand/need to add these validations. |
Well, you want the resulting WRROC objects to be valid - which (as far as I know) requires |
Summary:This PR addresses several updates and improvements based on recent mentor feedback: 1. CI Workflow Updates:
2. Code Cleanup and Refinements:
3. Consistency Improvements:
4. Validator Integration:
5. Validator Usage Clarification:
6. Alias Usage Update:
7. Test Case Adjustments:
|
Co-authored-by: salihuDickson <[email protected]>
Hi @SalihuDickson, I encountered a build failure during the recent CI run due to multiple errors in the WESConverter code, specifically in
Here's how I think we can fix these issues:
Please let me know your thoughts on this or if you have any suggestions on how best to resolve these errors. Thank you! Best regards, |
@Karanjot786, thank you for bringing this to my attention, I am working on a fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just need some clarification on these points, you don't need to make any changes to the code just yet.
crategen/converters/wes_converter.py
Outdated
"status": data_wes.state, | ||
"startTime": convert_to_iso8601(data_wes.run_log.start_time), | ||
"endTime": convert_to_iso8601(data_wes.run_log.end_time), | ||
"result": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputs in the wes schema and according to the Workflow run ro-crate schema, result can be an object so there is no reason for this to be a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the WES specification does not provide any specific keys for the output object, can you please provide a reference as to where you got the location
and name
keys for the output object.
Co-authored-by: salihuDickson <[email protected]>
Hey @Karanjot786, can you please walk me through the use of the Please don't make any changes yet, I simply want to get a better understanding of why we have the validators module because I feel like I might be missing something. |
Hi @SalihuDickson, Thank you for your feedback! I'd be happy to clarify the points you've mentioned.
I hope this helps clear up the reason for the current setup. Please let me know if you'd like further details or if any adjustments should be made based on this explanation. |
Thanks for the explanations, @Karanjot786! Could you please give one (or better: two) concrete examples of things that we need to validate that we won't get for free from Pydantic's automatic validation? Also, for these examples (and more generally), is there any particular reason why we don't use Pydantic's support for writing custom validators? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Karanjot786 - there is a lot of good stuff in this PR, and it's getting there. However, it's just too big a PR to get it over the line, and I feel I am getting caught up in details when reviewing.
I strongly suggest we take this PR apart into several more focused ones, probably to be merged in this order:
style: support linting
(make sure thatmypy
andruff check
pass)ci: reformat, reset branches & upgrade actions
(put just the latest.github/workflows/ci.yml
without including the job to run unit tests; consider includinglefthook.yml
as well)feat: add WES, TES & WRROC models
(please include appropriate unit tests here, if applicable; once you do add your first tests, add a job to the CI to run them)feat: integrate model validation in conversions
(you can use this PR here; basically, the models should be refactored to use the new models for validation; please leave out any custom validators - we will either add these in the next PR or not at all - see separate discussion thread started by @SalihuDickson)
For changes to pyproject.toml
, please use your best judgement where to put them (possibly to multiple PRs).
Some of these may not require any changes from the current state - others just a few. The models one and the last one - where everything comes together - will probably still need some work though, but will become much easier to manage when there are only a few files affected at any given time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine, but with the exception of the last couple of lines, the changes here are not related to this PR. And given that the PR is already huge and difficult to review, I would strongly suggest not to blow it up with anything that isn't directly related to its core focus.
My suggestion:
- Open a new PR
ci: reformat, reset branches & upgrade actions
(or similar) from a branch created from the default branch that introduces all of the changes introduced here, except for adding the job for running the unit tests (however, do not include the commented stuff, just remove it for now). We can then merge this PR immediately and you can merge the changes into this branch. - In this branch, you then just add the job for running the unit tests.
In this way, we already merge some unrelated changes from this PR and make it a bit smaller.
I will probably suggest some more such points, because this PR is dragging along waaay too long, which often happens when there are too many different things being cobbled together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll move the changes related to .github/workflows/ci.yml into a new PR, as suggested. I’ll only include the job for running the unit tests in this current PR. This should make the current PR smaller and easier to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, all the changes introduced here are linting/style-related changes. They are all fine, and I do think they should be made, but please not in this PR.
Please create a separate PR for linting/style changes, where you can do reformatting changes across the entire codebase. I suggest you create that one ASAP, we quickly review and merge it into the default branch, and then you can merge the default branch into this branch to get rid of many of the suggested changes in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll create a separate PR for the linting and style-related changes in crategen/cli.py
and other files, as advised. This will help keep the PR focused on core functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/format change unrelated to this PR: See above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll move the style and formatting changes from crategen/converter_manager.py
into the linting/style PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just style/format changes: See above
Exception: The wrroc_data
to data
renaming should stay in this PR (or again a separate PR, I will tell you later when I've looked through everything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll retain the wrroc_data
to data
renaming in this PR, as you mentioned. I’ll move the other style/format changes into a separate PR for linting and formatting.
TESInput(url=AnyUrl(url=obj.id), path=obj.name) for obj in data_wrroc.object | ||
] | ||
tes_outputs = [ | ||
TESOutput(url=AnyUrl(url=data_wrroc.result.id), path=data_wrroc.result.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that we can only have one .result
in data_wrroc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I’ll verify if data_wrroc.result
can have multiple items. If needed, I’ll adjust the logic to handle multiple results or return an error if more than one result is provided.
tes_executors = [ | ||
TESExecutor(image=data_wrroc.instrument or "", command=[]) | ||
] # Provide default empty list for command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two points:
- Where would we source the command from? Currently it seems that the command would always be an empty list, meaning that no reasonable TES task could ever be created here.
- Similarly, I think if there is no command to run or no environment/image to run it in, I don't think it makes sense to continue. Rather, we should exit with a helpful error message that no TES task can be created from the specified WRROC entity, because no image or command are specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your points. I’ll update the logic
- Source the command from the input data (e.g.,
data_wrroc
). - Add an error message if no command or environment is provided, rather than proceeding with an invalid TES task.
Raises: | ||
ValidationError: If TES data is invalid. | ||
""" | ||
# Validate TES data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all comments from the code in this entire module, I think the code is trivial enough to understand without comments (and the comments are distracting and sometimes misleading).
Please also remove all trivial comments from any other modules. I would say that this should be all or almost all comments. There are no complex algorithms in the project and no workarounds for bugs in other libraries etc so far (as far as I remember). If you feel that your code is too complex to understand just by reading it and the associated docstring, consider refactoring it to make it simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll remove the comments from crategen/converters/tes_converter.py
and other modules as requested. I’ll focus on refactoring the code where necessary to ensure clarity without comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR - looks like this is related to linting, so please put in a dedicated PR (see comments above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll move the lefthook.yml
changes into the dedicated linting/style PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look unrelated to this PR for the most part. Please put in the appropriate PR (or multiple PRs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll split the changes in pyproject.toml
into relevant PRs based on the focus of each change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please prepare a separate PR for the models. See general comment on PR for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll prepare a separate PR for the models as per your suggestion. This should help keep the PR focused and make it easier to manage.
@uniqueg, Thank you for your feedback! Two examples of validations we don’t get from Pydantic’s automatic validation include: 1. Content and URL Mutual Exclusion: In Regarding Pydantic's custom validators, we could explore leveraging them for these validations. If you agree, I can refactor the code to use Pydantic's custom validation mechanism. |
@uniqueg, Thanks for the suggestion. I understand the challenge of reviewing a large PR, and I agree that splitting the changes will make the process smoother. I will break the current PR into smaller, focused ones in the following order: 1. style: Support for linting and ensuring mypy and ruff pass. For changes to |
Hi @uniqueg and @SalihuDickson,
Summary
This PR introduces several key improvements and fixes based on recent feedback:
Refactor WRROC Models:
WRROCDataBase
to hold common fields and methods.WRROCData
,WRROCDataTES
, andWRROCDataWES
to inherit fromWRROCDataBase
to reduce code duplication and enhance maintainability.Stricter Validation:
extra = "allow"
withextra = "forbid"
in all WRROC models (WRROCProcess
,WRROCWorkflow
,WRROCProvenance
) to enforce stricter validation.Enhanced Validation and Error Collection:
extra = "forbid"
configuration.Expanded Unit Tests:
Summary by Sourcery
Add WRROC models with a new base class for shared fields, enhance validation with stricter rules and comprehensive error reporting, and expand unit tests to cover edge cases. Enable test execution in the CI workflow.
New Features:
Enhancements:
CI:
Tests: