Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour comparing two views of the same data: pp and netcdf #775

Open
bnlawrence opened this issue May 28, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@bnlawrence
Copy link

Usecase: I read some pp data, and look at what I have. I then write the same data out to netcdf, and read it back in. I expect the list of cf-fields to be identical. But they are not.

ff=cf.read('myfile.pp`)
ff
[<CF Field: geopotential_height(time(40), air_pressure(9), latitude(1921), longitude(2560)) m>,
 <CF Field: id%UM_m01s30i301_vn1106(time(40), air_pressure(6), latitude(1921), longitude(2560))>,
 <CF Field: id%UM_m01s30i407_vn1106(time(40), latitude(1920), longitude(2560))>,
 <CF Field: id%UM_m01s30i408_vn1106(time(40), latitude(1920), longitude(2560))>]

compare with the same operation aftrer writing that list of fields out to a netcdf file

ff=cf.read('myfile.nc')
ff
[<CF Field: geopotential_height(time(40), air_pressure(9), latitude(1921), longitude(2560)) m>,
 <CF Field: long_name=HEAVYSIDE FN ON P LEV/UV GRID(time(40), air_pressure(6), latitude(1921), longitude(2560))>,
 <CF Field: long_name=TOTAL MOISTURE FLUX U  RHO GRID(time(40), latitude(1920), longitude(2560))>,
 <CF Field: long_name=TOTAL MOISTURE FLUX V  RHO GRID(time(40), latitude(1920), longitude(2560))>]

This is cf.__version__ = 3.16.2

From my point of view the file format should not affect the logical view of the contents. I understand there may be some historical reasons for this behaviour, but maybe they should be reviewed.

@bnlawrence bnlawrence added the bug Something isn't working label May 28, 2024
@davidhassell
Copy link
Collaborator

Hi Bryan,

The CF logical contents of the Fields are the same (the Fields read from Pp do have long_names) - it's just the repr view.

A bit of context here - when you read from PP files, the Fields have their id attribute set (https://ncas-cms.github.io/cf-python/attribute/cf.Field.id.html). This is because we need to unambiguously define the PP fields for aggregation, also because not all PP fields will have a standard or long name and so in the absence of netCDF variables names we need a mechanism to unambiguously identify them. The repr function has a hierarchy of identities from which it chooses to display. This hiercarchy goes standard_name, id, long_name, netcdf variable. The first of these to be set gets displayed, so for the PP case, when there is no standard name the id gets shown, because it is definitive (unlike the general long_name). When reading from netCDF files, no id is set because there is no obvious value to set it to, and no use for it.

Options:

  1. Do nothing.
  2. Remove the id attribute from fields read from PP (after aggregation).
  3. Change the repr identity hierarchy.

All three have pros and cons - I suspect one size does not fit all, here :)

@bnlawrence
Copy link
Author

Ok, that makes sense, but in CF Python we have got the same logical content in both cases, so I expected to see the same thing. How horrible do you think the outcome would be if you did step 3? I guess, I'm wondering about the pros and cons (maybe this is a cf version 4 change, if at all).

@sadielbartholomew
Copy link
Member

sadielbartholomew commented May 29, 2024

Thanks for the clarification of the context, David. I for one (two) was not aware of that.

All three have pros and cons - I suspect one size does not fit all, here :)

Indeed, I think the best solution would be to make it configurable so that if a user such as Bryan wants identical representations of the contents, they can get that, but they can also choose not to remove the id attribute from the PP, at appropriate points in each case.

So for me the decision is how to best support that with the API, and what the default behaviour would be, assuming we're happy to do the work to enable it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants