Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Be able to stop CONCISE from adding a new dimension to coordinate variables #131

Open
danielfromearth opened this issue Aug 26, 2024 · 5 comments

Comments

@danielfromearth
Copy link
Collaborator

danielfromearth commented Aug 26, 2024

Getting a new error in the regression test notebook for the SAMBAH service chain.... An error when trying to open an output dataset using xarray — I think this is likely because of how CONCISE is producing the result (i.e., stacking layers):

MissingDimensionsError: 'xtrack' has more than 1-dimension and the same name as one of its dimensions ('subset_index', 'xtrack'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

Perhaps we can add a way to tell CONCISE that a variable or variables—in this case, xtrack— is a coordinate variable that should not have a new subset_index dimension added to it. In other words, keep the 1D coordinate variable as a 1D variable in CONCISE's output.

@podaac podaac deleted a comment Aug 26, 2024
@podaac podaac deleted a comment Aug 26, 2024
@joshgarde
Copy link
Member

joshgarde commented Aug 26, 2024

The way CONCISE works is that it performs merging amongst multiple granules by introducing the subset_index dimension to all variables to allow access to the individual merged granules' values. That way the original granule data is fully preserved. Making an exception for 1D variables would break that design

It is known that xarray doesn't always play nice with all CONCISE-merged datasets, but I always viewed that more as a limitation of xarray than CONCISE itself. For an example of xarray limitations, we had full support for NetCDF groups before xarray had their support for them

Is there a way to work around xarray on SAMBAH? I'd prefer to see a downchain solution over needing to change how CONCISE works since its been around for a fairly long time for users

@joshgarde
Copy link
Member

@danielfromearth Oh wait, is this an issue with the SAMBAH chain itself or just an issue with a regression test because xarray can't open the dataset?

@danielfromearth
Copy link
Collaborator Author

danielfromearth commented Aug 26, 2024

We're currently determining whether to consider these outputs an unacceptable output from the chain, or whether to consider it an output to work around.

The original data structure depends on two coordinate variables (as defined by CF/COARDS here) called xtrack and mirror_step. For instance, a derived geophysical variable like "vertical_column" has dimensions of ('mirror_step', 'xtrack'), "latitude" has dimensions of ('mirror_step', 'xtrack'), and "xtrack" has dimensions of ('xtrack'). So, xtrack satisfies this definition of a coordinate variable: "1-dimensional netCDF variables whose dimension names are identical to their variable names are regarded as "coordinate variables" (axes of the underlying grid structure of other variables defined on this dimension)."

When the data is processed by CONCISE, it ends up creating a netCDF with vertical_column('subset_index', 'mirror_step', 'xtrack'), latitude('subset_index', 'mirror_step', 'xtrack'), and xtrack('subset_index', 'xtrack'). It seems okay from one perspective, but since xtrack becomes 2D, it no longer satisfies the CF/COARDS definition of a coordinate variable.

@joshgarde
Copy link
Member

The only way I can see CONCISE passing through the 1D variable untouched is if CONCISE can verify that every granule its merging contains the same data in that variable otherwise there would be a loss of data during merging. But then I would question whether or not the existing user base is okay with changing CONCISE's behavior at this point while it's in maintenance mode

My opinion is leaning more towards that CONCISE should remain fairly consistent in its behavior so far and its more of a downstream tooling issue to support its output

@danielfromearth
Copy link
Collaborator Author

@joshgarde, thanks for the comments on this!

After digging into this further with @ank1m, it looks like there might already be a resolution to this difficulty in xarray (see this issue pydata/xarray#2233 and pull request pydata/xarray#7989), when using a version newer than currently in the harmony regression test notebook. Seems promising, and will update in this thread more as we determine if the issue will be solved with dependency updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@joshgarde @danielfromearth and others