Refactor process() testing & cleaning cube masking/filtering logic #58

truth-quark · 2024-07-31T02:44:57Z

Resolves #50 (also related to #14).

This is a fairly complicated PR around testing & refactoring the central process() workflow function in um2netcdf/um2nc.

This work is intended to provide initial testing & cleanup of process(), providing a framework to ensure the workflow is consistent over future refactoring steps.

For background:

um2nc processes UM files, modifying variables & saving to NetCDF files
The core um2nc driver is the process() workflow function (it's also the main API entry point)
Commit 0 in this repo had no unit tests
process() was initially 140 lines of processing logic & too difficult to unit test
Recent work extracted blocks of functionality (e.g. variable renaming) to separate functions. This simplified & compressed process() to focus on the workflow logic
As process() shrank, I identified tricky logic & inefficient data handling (see Clean & optimise masking logic/workflow #50 notes)
The requirement to retrofit unit testing naturally encourages refactoring to simplify workflow logic

Known Problems:

Two process() tests require substantial setup & multiple ugly ~~nested~~ mock.patch() calls. This is partly due to complexity in process(), touching separate I/O libraries.
Both mule and iris fields file objects are complex
- Multiple attrs are created dynamically when opened
- Mocking these is tricky with the spec= option, since spec only detects init time attributes
Some mocks may not represent real object config, however I've tweaked some code to skip some side issues
process() doesn't return success or failure data for the cubes

Future steps/direction/thoughts:

For upcoming steps, process() is likely to gain cube modification functionality as it is extracted from cubewrite()
process() tests should be flexible to expand to cover the additional mod steps
process() needs reordering to divide into 3 broad steps: input I/O, cube modification, output I/O
- Idea: sub-steps could be refactored to independent sub-workflow funcs, which suggests process() tests could be split into smaller chunks. This ought to reduce setup dependencies, reducing test setup burden (e.g. less mock.patch() nesting)

For the review, any opinions on the restructure, gaps & correctness checks are welcome! It would be good to reduce the complex setup.

…ribute.

…y process() & tests.

Update test to use aiihca.paa1jan.subset data.

Refactor args to test fixture.

Add process() test with heaviside_uv/t masking.

test/test_um2netcdf.py

umpost/um2netcdf.py

CodeGat · 2024-07-31T03:43:00Z

Added branch protections to this repo so it'll restrict merging without a review.

umpost/um2netcdf.py

CodeGat

I don't think I'd be able to comment of the design as it's a bit above my level...but I do have some suggestions

test/test_um2netcdf.py

umpost/um2netcdf.py

test/test_um2netcdf.py

umpost/um2netcdf.py

Paths are deliberately fake to prevent filesystem access.

Fix test_process_no_heaviside_drop_cubes() to assert against function outputs.

… outputs.

…() outputs. Removed mock assertions coupled to implementation details.

Rework the pressure level code to simplify logic. Add/expand tests to ensure heaviside uv/t code branches are tested.

Testing shouldn't assert against the sman variable as it's an internal implementation detail.

blimlim

Changes are looking good to me! The masking logic looks like its doing what it should be, and I think is also easier to read.

From what I can understand, the tests look good too.

I just had a couple of small suggested changes to do with function names and docstrings which I think can now be updated to reflect changes in what the functions are doing.

test/test_um2netcdf.py

umpost/um2netcdf.py

blimlim · 2024-08-13T05:20:38Z

All tests passed on Gadi!

Update docstrings to reflect modified functionality.

truth-quark · 2024-08-13T06:17:09Z

@blimlim I've made the changes, which definitely make the docs better.

Do you want to go over the fixes quickly & resolve any the sub-discussions you're happy with?

blimlim

Nice!

CodeGat

Had a re-review of the changes and it looks good. Note: I'm not a python expert so I might have missed some stuff.

truth-quark · 2024-08-13T06:36:39Z

Had a re-review of the changes and it looks good. Note: I'm not a python expert so I might have missed some stuff.

As it's an evolving project, omissions can be fixed :-). We're in a better position now with testing covering key code blocks.

truth-quark · 2024-08-13T06:48:53Z

The freak show is merged, with 100+ comments...

truth-quark added 12 commits July 29, 2024 15:15

Refactor: skip setting item code if cube already has an item_code att…

c21fac5

…ribute.

Freeze DummyStash data classes to ensure they're dict hashable.

ad5fa29

Add ugly initial process() test.

a4bc68e

Refactor: move mule functionality to sub-workflow function to simplif…

3edcb07

…y process() & tests.

Refactor: move netcdf formats lookup to constant.

2475aa1

Add/remove TODOs.

8873db6

Refactor: rework test_process() to remove mock setup & heaviside cube.

305e3e8

Refactor: rework process() logic to filter out cubes requiring masking.

b9bfd92

Update test to use aiihca.paa1jan.subset data.

Add test for case where all cubes are filtered out.

f39b497

Refactor args to test fixture.

Refactor mask application logic.

6700578

Add process() test with heaviside_uv/t masking.

Refactor: add item code to section/item helper function.

52bbe06

Update comments & future TODOs.

4e8bf20

truth-quark self-assigned this Jul 31, 2024

truth-quark added the enhancement New feature or request label Jul 31, 2024

truth-quark requested review from blimlim, CodeGat and marc-white July 31, 2024 02:45

truth-quark commented Jul 31, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

truth-quark commented Jul 31, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

truth-quark commented Jul 31, 2024

View reviewed changes

umpost/um2netcdf.py Show resolved Hide resolved

truth-quark commented Jul 31, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

CodeGat reviewed Jul 31, 2024

View reviewed changes

test/test_um2netcdf.py Show resolved Hide resolved

test/test_um2netcdf.py Outdated Show resolved Hide resolved

test/test_um2netcdf.py Outdated Show resolved Hide resolved

test/test_um2netcdf.py Outdated Show resolved Hide resolved

marc-white reviewed Jul 31, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

marc-white reviewed Jul 31, 2024

View reviewed changes

umpost/um2netcdf.py Show resolved Hide resolved

marc-white reviewed Aug 1, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

marc-white reviewed Aug 1, 2024

View reviewed changes

test/test_um2netcdf.py Outdated Show resolved Hide resolved

marc-white reviewed Aug 1, 2024

View reviewed changes

umpost/um2netcdf.py Outdated Show resolved Hide resolved

truth-quark added 2 commits August 1, 2024 17:06

Reformat NC_FORMATS lookup.

d99d82e

Refactor input/output args to use deliberately fake paths.

f2c628c

Paths are deliberately fake to prevent filesystem access.

truth-quark added 10 commits August 9, 2024 14:31

Refactor cube test fixtures for process/masking tests.

c8c95d6

Refactor process() to return modified cubes.

253d302

Fix test_process_no_heaviside_drop_cubes() to assert against function outputs.

Refactor test_process_all_cubes_filtered() to assert against function…

e9c7aac

… outputs.

Refactor test_process_mask_with_heaviside() to assert against process…

027c681

…() outputs. Removed mock assertions coupled to implementation details.

Update comments.

112df70

Simplify & refactor process() for cube filtering & masking.

bc10b8a

Rework the pressure level code to simplify logic. Add/expand tests to ensure heaviside uv/t code branches are tested.

Run filtered_cubes() only when required.

ab8c3bb

Simplify sman var mocks.

9d4e9da

Testing shouldn't assert against the sman variable as it's an internal implementation detail.

Bugfix: correctly assign sman named mocks to context manager.

e25c89c

Make cube result assertions slightly more thorough.

10f33e9

truth-quark mentioned this pull request Aug 12, 2024

Converting input path for mule to a string? #67

Open

truth-quark requested a review from blimlim August 12, 2024 06:46

truth-quark mentioned this pull request Aug 12, 2024

Future neatening/readability #27

Open

16 tasks

truth-quark added 2 commits August 12, 2024 17:17

Fix naming of generator to yield cubes that don't require masking.

c2cecaf

Move item to stash code conversion to um2nc module.

afbadcb

blimlim requested changes Aug 13, 2024

View reviewed changes

truth-quark added 3 commits August 13, 2024 15:47

Fix comments for clarity.

5c6a28a

Refactor: rename get_pressure_levels() to get_heaviside_cubes().

700c58d

Update docstrings to reflect modified functionality.

Update docstring to reflect filtering process.

bd78fd4

truth-quark requested a review from blimlim August 13, 2024 06:15

CodeGat self-requested a review August 13, 2024 06:22

blimlim approved these changes Aug 13, 2024

View reviewed changes

CodeGat approved these changes Aug 13, 2024

View reviewed changes

This was referenced Aug 13, 2024

Refactor coord mocks when #63/lat long fixes is merged #69

Open

Handle str/pathlib.Paths for mule file loads #70

Open

truth-quark merged commit b1539af into develop Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor process() testing & cleaning cube masking/filtering logic #58

Refactor process() testing & cleaning cube masking/filtering logic #58

truth-quark commented Jul 31, 2024 •

edited

Loading

CodeGat commented Jul 31, 2024

CodeGat left a comment

blimlim left a comment

blimlim commented Aug 13, 2024

truth-quark commented Aug 13, 2024

blimlim left a comment

CodeGat left a comment

truth-quark commented Aug 13, 2024

truth-quark commented Aug 13, 2024

Refactor process() testing & cleaning cube masking/filtering logic #58

Refactor process() testing & cleaning cube masking/filtering logic #58

Conversation

truth-quark commented Jul 31, 2024 • edited Loading

CodeGat commented Jul 31, 2024

CodeGat left a comment

Choose a reason for hiding this comment

blimlim left a comment

Choose a reason for hiding this comment

blimlim commented Aug 13, 2024

truth-quark commented Aug 13, 2024

blimlim left a comment

Choose a reason for hiding this comment

CodeGat left a comment

Choose a reason for hiding this comment

truth-quark commented Aug 13, 2024

truth-quark commented Aug 13, 2024

truth-quark commented Jul 31, 2024 •

edited

Loading