You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, the ACCESS-NRI Intake Catalog doesn't allow for searching of coordinate variables: for example, searching for st_edges_ocean will return 0 datasets. This can make searching for coordinate variables difficult, with 2 main pain points:
If the coordinate variable is know to be stored in a netCDF file with specific data variables:
The user needs to know that these data & coordinate variables are found in the same files, and then search for the data variable in order to access the coordinate variables.
In some instances, the user can directly access the coordinate variables by searching the data variables. In others, they need to perform something like
Although this doesn't require the user to (semi) manually work out what file to open, it's still messy as it requires passing round file names.
In other instances, coordinate variables are stored completely separately. For example, ocean_grid.nc files only contain coordinate variables, and so cannot be found using the catalogue. The only way to currently access these files is to search the catalogue to get a handle on the directory structure - and then construct a file path and load it: eg:
This requires the user to start poking round in directory structures to try to work out where to load their data - which is the problem intake is trying to solve.
I also think this might be the same issue as discussed in #63? @aidanheerdegen - seem to be some concerns about coordinates being listed as variables when they shouldn't be there?
Describe the feature you'd like
Searchable coordinates: in the same way that the catalog currently lets you perform searches over variables, it would be useful to be able to do the same on coordinates:
The catalog needs to know that coordinates & data variables aren't the same & need to be treated differently - xr.combine_by_coords will fail if passed a coordinate variable.
Add separate coordinate coordinate variable fields to the ACCESS-NRI Intake Catalog, rather than just making the same change as in Intake-ESM (data_vars => variables), as this would then confuse coordinates & variables in the ACCESS-NRI Intake Catalog as well as causing concatenation issues. This is implemented on branch 660-coordinate-variables.
Additional Info
Due to the release cycle of Intake-ESM, this solution will probably require us to maintain a fork - at least for some time.
I've performance tested the proposed solution & changes in catalogue build times are small (typically ~5%), catalogue read times similar (typically 5-10%, sometimes faster), and the size of datastore.csv.gz files writted by builder.save() are typically approximately doubled.
The text was updated successfully, but these errors were encountered:
I also think this might be the same issue as discussed in #63? @aidanheerdegen - seem to be some concerns about coordinates being listed as variables when they shouldn't be there?
That was specifically for a project where we were using the intake catalogue as a source for an "experiment explorer", to expose the variables saved in an experiment in timeline to assist users in understanding what variables are available at different times in an experiment. For this purpose we really only wanted diagnostic model variables that have a time-varying component.
Add separate coordinate variable fields to the ACCESS-NRI Intake Catalog
I'm confused. Does this mean
Have a "coordinate" flag (field)?
Move coordinates into a separate catalogue?
or neither?
BTW this is a somewhat related issue I think about encoding grid information:
Is your feature request related to a problem? Please describe.
Currently, the ACCESS-NRI Intake Catalog doesn't allow for searching of coordinate variables: for example, searching for
st_edges_ocean
will return 0 datasets. This can make searching for coordinate variables difficult, with 2 main pain points:Although this doesn't require the user to (semi) manually work out what file to open, it's still messy as it requires passing round file names.
ocean_grid.nc
files only contain coordinate variables, and so cannot be found using the catalogue. The only way to currently access these files is to search the catalogue to get a handle on the directory structure - and then construct a file path and load it: eg:This requires the user to start poking round in directory structures to try to work out where to load their data - which is the problem intake is trying to solve.
This has caused some pain points migrating COSIMA recipes from cosima_cookbook => intake.
I also think this might be the same issue as discussed in #63? @aidanheerdegen - seem to be some concerns about coordinates being listed as variables when they shouldn't be there?
Describe the feature you'd like
Searchable coordinates: in the same way that the catalog currently lets you perform searches over variables, it would be useful to be able to do the same on coordinates:
Doing this is subject to a couple of constraints:
xr.combine_by_coords
will fail if passed a coordinate variable.Proposed Solution
data_vars
=>variables
), as this would then confuse coordinates & variables in the ACCESS-NRI Intake Catalog as well as causing concatenation issues. This is implemented on branch 660-coordinate-variables.Additional Info
datastore.csv.gz
files writted bybuilder.save()
are typically approximately doubled.The text was updated successfully, but these errors were encountered: