Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common agreement on loading CF non-compliant NetCDF files #5165

Open
2 tasks
trexfeathers opened this issue Feb 20, 2023 · 4 comments
Open
2 tasks

Common agreement on loading CF non-compliant NetCDF files #5165

trexfeathers opened this issue Feb 20, 2023 · 4 comments
Assignees
Labels
Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info Feature: NetCDF + CF-conventions

Comments

@trexfeathers
Copy link
Contributor

trexfeathers commented Feb 20, 2023

Iris needs a public statement on how it handles NetCDF files that deviate from the CF conventions. This will serve multiple benefits:

  • More certainty when discussing if/how Iris should load a particular file.
  • Clearer direction when developing the codebase.
  • Set user expectations.

Writing this statement will involve making some difficult decisions. A working group is tackling this now: @tkknight, @bjlittle, @lbdreyer, @pp-mo, @trexfeathers, @stephenworsley, @ESadek-MO, @scottrobinson02, @HGWright

Factors at play

  • More CF compliance means smoother collaboration between institutions, and Iris can play a part in raising awareness.
  • CF evolves over time, so may develop 'opinions' on things that previously didn't matter and invalidate older files.
  • The available tooling can make it difficult to address non-compliances in a file.
  • UX - being strict/verbose about CF compliance makes the user experience more awkward.
  • Iris has a place in the scientific Python community - people choose Iris / Xarray / raw netCDF4 / something else / for different purposes, and CF handling plays a part in that.
  • Continuing to work in the face of CF non-compliances could need more defensive code.

Items affected

(please edit if you know of others)

Tasks

  1. Dragon Sub-Task 🦎 Experience: Medium Feature: NetCDF + CF-conventions Status: Needs Info Type: Enhancement Type: Question
  2. Feature: ESMValTool Feature: NetCDF + CF-conventions Status: Blocked Type: Bug
@trexfeathers
Copy link
Contributor Author

Summary from working group conversations

2023-02-02, 2023-02-14, 2023-03-22

Note this issue is not intended as a debate, hence why it is not posted as a discussion. The below conversations took place in real time, with a group deliberately sized to aid decision making.

Outcome - our ideal implementation

When loading NetCDF files, Iris will load all CF-compliant elements. A container of non-compliant variables and attributes will be attached to the Cube(s).

Encourage users:

If this causes you problems, please reach out to us to see if we can collaborate on a solution.

Implementation considerations

  • How to contain things that can't be represented properly?
  • Associate things with Cubes or isolated in own list?
  • Activate behaviour with a FUTURE flag?

Working group summary comments

  • @trexfeathers: embrace imperfection, skipping non-compliances sounds good if warnings work.
  • @stephenworsley: CF compliance is a good aim, but can't always be expected.
  • @pp-mo: CF offers optional ways of doing things, Iris ought to do its best, but not insist. Discourage 'bad CF'.
  • @bjlittle: KISS. Make users' lives simple, don't be awkward.
  • @lbdreyer: we'll always break someone's workflow. Need a plan to help those who are left behind.
  • @scottrobinson02: spirit of compromise. Accept that going in.
  • @tkknight: KISS. Informative messages when things don't work.
  • @HGWright: if we can do something we should do something. Don't throw toys from pram. Make our actions clear.
  • @ESadek-MO: no easy solution, communicate well, focus on warnings.

Discussion topics

Encouraging compliance in the community

  • We know examples where Iris' strictness has resulted in more compliant - more interoperable - files.
  • CF is a convention, not a standard.
  • CF is the only available convention and is therefore used for anyone looking for help making files interoperable.
  • Iris' scope is wider than CF, and Iris doesn't implement all of CF.
    Need to avoid inventing our own rules.
  • CF's longevity is relevant.

Files changing from acceptable to unacceptable

  • While CF is intended to be backwards compatible, checks (within Iris, cf-checker, whatever) are not a complete implementation and may evolve over time, invalidating previously acceptable files.

Ease of massaging files to be compliant

  • Always going to be somewhat difficult.
  • If Iris can't cope with non-CF, then users forced onto another tool.
    • Could edit the file directly using ncedit or NetCDF4, but this can be challenging, and editing a copy may be unrealistic.
    • All the rich tools (Iris, Xarray, cf-python) have their own opinions.
    • ncdata has the potential to make this much easier.
  • Should Iris include a non-CF layer, lower than a Cube, to help with fixing?

User experience (UX)

  • Cannot be underestimated.
  • Undesirable to flatly refuse to load.
  • Need clarity on what Iris expects.
  • Need user education.
  • Warnings are an opportunity to encourage compliance and help, without 'being awkward'
    • Really important to not ruin UX with even more warnings.
    • Classify warnings? Allowing users granularity for what the care about / ignore?
  • CF brings some inevitable complexity, some user effort required.
  • Compromises are necessary.

Iris' place in the world

  • Interoperability allows using other, more tolerant tools.
  • Learning/adopting other tools is nevertheless not as good as getting everything from one place.
  • We should aim to avoid duplication within the geoscience community.

Ease of software development

  • Defensive code takes extra effort.
  • Iris could be written to work with things it doesn't explicitly understand.
  • API changes could make things easier:
    • Interchange between Cube and _DimensionalMetadata.
    • Easier construction of Cubes from scratch.
  • Might be easier to include user-level fixing tools in Iris, rather than making Iris cope better.

Preferred approaches

Determined via voting.

  1. Iris only loads CF compliant parts of file, skipping non-compliant (maybe raises warning?).
  2. Iris allows the user to configure how it will interpret malformed file.

@edmundhenley-mo
Copy link

edmundhenley-mo commented Jul 1, 2024

Oooh just discovered this issue via DragonTaming board @trexfeathers.

Sounds like you've got a fair bit of input from working group already; please shout though if useful to have more, as this is a particularly painful area for space weather - and we've got a good amount of requirements (ionosphere and lower) in the iris-o-sphere of traditional geographic lat/lon coords!

More context on why CF non-compliance an issue for space weather

Highly interested: space weather is not represented in CF conventions, so data wrangling is a key issue for us.

There's a few times where I've consciously decided not to go with iris due to anticipating "ugh, lots of pain handling I/O at boundaries due to data being inherently non-CF-compliant"

In retrospect, often this decision was bad:

  • I've ended up writing (and then having to support!) custom code - e.g. pseudo-geo-aware dataclasses & methods for ionospheric data - which ends up being a poorer version of iris.
  • I'd have been better served going for the real deal, and biting the boundaries-pain bullet.

Self-interestedly v happy to give more input if useful - help you help me!

@trexfeathers
Copy link
Contributor Author

My personal proposal, after some loose discussion with @bjlittle and @pp-mo:

  • CFVariableMixin gets a new member called something like loading_problems - this is either a str or list of str.
  • We introduce a global object somewhere that has a matching name e.g. LOADING_PROBLEMS. This is a dict of list.
  • Any operations which attempt to create Iris objects from file contents (whether that's a Cube, AuxCoord, whatever) get wrapped in a try-except block. In the except block:
    • Instead create a basic Cube with minimal parsing - the array (if present) goes into Cube.data and everything else goes into Cube.attributes.
    • The problem is recorded in Cube.loading_problems.
    • The new 'problem cube' is added to LOADING_PROBLEMS - the dict key is the file name
  • Any operations which attempt to add Iris objects to a Cube during the loading process get wrapped in a try-except block. In the except block:
    • The problem is recorded in the .loading_problems member of this object
    • This object is added to LOADING_PROBLEMS - the dict key is the file name
  • At the end of the loading process: LOADING_PROBLEMS is checked to see if it has been populated (or if it has grown, depending whether we clear it out between load calls), and a single warning is issued suggesting that the user check the contents.
  • Need to make it as easy as possible to convert Cubes to other classes. We already have the from_metadata() method. Maybe all we need to add is decent documentation examples, but maybe we need new conveniences too?

This should serve to allow loading to continue under as many circumstances as possible, and providing the user with recourse to fix up problem objects post-loading. Should be reasonably simple to scour through the loading code to find likely places for try-except. This feature need not be limited to CF-parsing in NetCDF, although that is presumably the source of most of the problems.

@stephenworsley
Copy link
Contributor

stephenworsley commented Sep 25, 2024

From @SciTools/peloton : consider an option to fail/warn fast.

@trexfeathers trexfeathers moved this from 📌 Prioritised to 🛡 Championed in 🐉 Dragon Taming Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info Feature: NetCDF + CF-conventions
Projects
Status: No status
Status: 🛡 Championed
Status: No status
Development

No branches or pull requests

4 participants