Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor process() testing & cleaning cube masking/filtering logic #58

Merged
merged 34 commits into from
Aug 13, 2024

Conversation

truth-quark
Copy link
Collaborator

@truth-quark truth-quark commented Jul 31, 2024

Resolves #50 (also related to #14).

This is a fairly complicated PR around testing & refactoring the central process() workflow function in um2netcdf/um2nc.

This work is intended to provide initial testing & cleanup of process(), providing a framework to ensure the workflow is consistent over future refactoring steps.

For background:

  • um2nc processes UM files, modifying variables & saving to NetCDF files
  • The core um2nc driver is the process() workflow function (it's also the main API entry point)
  • Commit 0 in this repo had no unit tests
  • process() was initially 140 lines of processing logic & too difficult to unit test
  • Recent work extracted blocks of functionality (e.g. variable renaming) to separate functions. This simplified & compressed process() to focus on the workflow logic
  • As process() shrank, I identified tricky logic & inefficient data handling (see Clean & optimise masking logic/workflow #50 notes)
  • The requirement to retrofit unit testing naturally encourages refactoring to simplify workflow logic

Known Problems:

  • Two process() tests require substantial setup & multiple ugly nested mock.patch() calls. This is partly due to complexity in process(), touching separate I/O libraries.
  • Both mule and iris fields file objects are complex
    • Multiple attrs are created dynamically when opened
    • Mocking these is tricky with the spec= option, since spec only detects init time attributes
  • Some mocks may not represent real object config, however I've tweaked some code to skip some side issues
  • process() doesn't return success or failure data for the cubes

Future steps/direction/thoughts:

  • For upcoming steps, process() is likely to gain cube modification functionality as it is extracted from cubewrite()
  • process() tests should be flexible to expand to cover the additional mod steps
  • process() needs reordering to divide into 3 broad steps: input I/O, cube modification, output I/O
    • Idea: sub-steps could be refactored to independent sub-workflow funcs, which suggests process() tests could be split into smaller chunks. This ought to reduce setup dependencies, reducing test setup burden (e.g. less mock.patch() nesting)

For the review, any opinions on the restructure, gaps & correctness checks are welcome! It would be good to reduce the complex setup.

@truth-quark truth-quark self-assigned this Jul 31, 2024
@truth-quark truth-quark added the enhancement New feature or request label Jul 31, 2024
test/test_um2netcdf.py Outdated Show resolved Hide resolved
test/test_um2netcdf.py Outdated Show resolved Hide resolved
@CodeGat
Copy link

CodeGat commented Jul 31, 2024

Added branch protections to this repo so it'll restrict merging without a review.

umpost/um2netcdf.py Outdated Show resolved Hide resolved
Copy link

@CodeGat CodeGat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'd be able to comment of the design as it's a bit above my level...but I do have some suggestions

test/test_um2netcdf.py Show resolved Hide resolved
test/test_um2netcdf.py Outdated Show resolved Hide resolved
test/test_um2netcdf.py Outdated Show resolved Hide resolved
test/test_um2netcdf.py Outdated Show resolved Hide resolved
test/test_um2netcdf.py Outdated Show resolved Hide resolved
test/test_um2netcdf.py Outdated Show resolved Hide resolved
test/test_um2netcdf.py Outdated Show resolved Hide resolved
umpost/um2netcdf.py Outdated Show resolved Hide resolved
Fix test_process_no_heaviside_drop_cubes() to assert against function outputs.
…() outputs.

Removed mock assertions coupled to implementation details.
Rework the pressure level code to simplify logic. Add/expand tests to ensure heaviside uv/t code branches are tested.
Testing shouldn't assert against the sman variable as it's an internal implementation detail.
Copy link
Collaborator

@blimlim blimlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are looking good to me! The masking logic looks like its doing what it should be, and I think is also easier to read.

From what I can understand, the tests look good too.

I just had a couple of small suggested changes to do with function names and docstrings which I think can now be updated to reflect changes in what the functions are doing.

test/test_um2netcdf.py Outdated Show resolved Hide resolved
test/test_um2netcdf.py Outdated Show resolved Hide resolved
umpost/um2netcdf.py Outdated Show resolved Hide resolved
umpost/um2netcdf.py Outdated Show resolved Hide resolved
umpost/um2netcdf.py Show resolved Hide resolved
umpost/um2netcdf.py Outdated Show resolved Hide resolved
@blimlim
Copy link
Collaborator

blimlim commented Aug 13, 2024

All tests passed on Gadi!

@truth-quark
Copy link
Collaborator Author

@blimlim I've made the changes, which definitely make the docs better.

Do you want to go over the fixes quickly & resolve any the sub-discussions you're happy with?

@CodeGat CodeGat self-requested a review August 13, 2024 06:22
Copy link
Collaborator

@blimlim blimlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link

@CodeGat CodeGat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a re-review of the changes and it looks good. Note: I'm not a python expert so I might have missed some stuff.

@truth-quark
Copy link
Collaborator Author

Had a re-review of the changes and it looks good. Note: I'm not a python expert so I might have missed some stuff.

As it's an evolving project, omissions can be fixed :-). We're in a better position now with testing covering key code blocks.

@truth-quark
Copy link
Collaborator Author

The freak show is merged, with 100+ comments...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean & optimise masking logic/workflow
6 participants