-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a new documentation page for faster GRIB aggregations #495
Conversation
GRIB Aggregations | ||
----------------- | ||
|
||
This new method for reference aggregation, developed by **Camus Energy**, is based on GRIB2 files. Utilizing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't be "new" for long, so drop this.
I would put the restrictions first:
- must have .idx files
- specialised for time-series data, each file having identical message structure
@emfdavid : an opinion on whether Camus wants to be referenced here and how that would look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emfdavid Should I mention Camus Energy
like @martindurant asked here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked in with the team and it would be great if you could link https://www.camus.energy/ and if there is a place for attribution you can include my github handle @emfdavid, otherwise the contributors list is fine.
The index in ``idx`` file indexes the GRIB messages where as the ``k_index`` (kerchunk index) | ||
we build as part of this workflow index the variables in those messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note explains my question above, but is not very clear. Map out the steps we need to do before launching into the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Steps for how we build the index? Should I include the code for building the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No code, just brief points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start - one suggestion re limitations.
- The ``.idx`` file must be of *text* type. | ||
- Only specialised for time-series data, where GRIB files | ||
have *identical* structure. | ||
- Aggregation only works for a specific **forecast horizon** files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference index can be combined across many horizons but each horizon must be indexed separately.
Looking forward to seeing what you make of the reinflate api... there you can see all of the FMRC slices are supported against a collection of indexed data from many horizons, runtimes and valid times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reinflate api
ooh, what is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method to turn the k_index and the metadata back into a ref_spec you can use in zarr/xarray
https://github.com/asascience-open/nextgen-dmac/blob/main/grib_index_aggregation/dynamic_zarr_store.py#L198
I think @Anu-Ra-g is already working on adding it into kerchunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, in that case I suspect it works already, right @Anu-Ra-g : but you can only work on one set of horizons OR one set of timepoints, not both at once? Something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can return an array with multiple dimensions.
I didn't have a strong use for this so struggled to do something general and practical.
For instance if you request by Horizon, you can provide multiple horizon axis and you dimensions should include 'horizon' and 'valid_time". Similarly you can request multiple runtimes and then your dimensions should include 'runtime' and 'step'.
Honestly not sure if this is helpful or over complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant I tried it out with one set of horizons with the original code. Actually, I'm still figuring out the reinflating part of the code, aggregation types and the new indexes.
I noticed that for reinflating can also work with a grib_tree
model made from a single grib file.
@emfdavid can you confirm this in this notebook that I made?
Let me know when this PR is ready for another look. |
@martindurant I've made the changes like you've suggested. It is ready for review. |
This PR adds a new page in the
kerchunk
documentation for faster reference consolidation for GRIB files.