Added a new documentation page for faster GRIB aggregations #495

Anu-Ra-g · 2024-08-27T15:40:32Z

This PR adds a new page in the kerchunk documentation for faster reference consolidation for GRIB files.

docs/source/reference_aggregation.rst

martindurant · 2024-08-27T16:53:37Z

docs/source/reference_aggregation.rst

+GRIB Aggregations
+-----------------
+
+This new method for reference aggregation, developed by **Camus Energy**, is based on GRIB2 files. Utilizing


I won't be "new" for long, so drop this.

I would put the restrictions first:

must have .idx files

specialised for time-series data, each file having identical message structure

@emfdavid : an opinion on whether Camus wants to be referenced here and how that would look.

@emfdavid Should I mention Camus Energy like @martindurant asked here?

I checked in with the team and it would be great if you could link https://www.camus.energy/ and if there is a place for attribution you can include my github handle @emfdavid, otherwise the contributors list is fine.

docs/source/reference_aggregation.rst

martindurant · 2024-08-27T16:57:55Z

docs/source/reference_aggregation.rst

+    The index in ``idx`` file indexes the GRIB messages where as the ``k_index`` (kerchunk index)
+    we build as part of this workflow index the variables in those messages.


This note explains my question above, but is not very clear. Map out the steps we need to do before launching into the details.

Steps for how we build the index? Should I include the code for building the index?

No code, just brief points.

docs/source/reference_aggregation.rst

emfdavid

Great start - one suggestion re limitations.

emfdavid · 2024-08-28T14:52:53Z

docs/source/reference_aggregation.rst

+  - The ``.idx`` file must be of *text* type.
+  - Only specialised for time-series data, where GRIB files
+    have *identical* structure.
+  - Aggregation only works for a specific **forecast horizon** files.


The reference index can be combined across many horizons but each horizon must be indexed separately.
Looking forward to seeing what you make of the reinflate api... there you can see all of the FMRC slices are supported against a collection of indexed data from many horizons, runtimes and valid times.

the reinflate api

ooh, what is this?

The method to turn the k_index and the metadata back into a ref_spec you can use in zarr/xarray
https://github.com/asascience-open/nextgen-dmac/blob/main/grib_index_aggregation/dynamic_zarr_store.py#L198
I think @Anu-Ra-g is already working on adding it into kerchunk?

Oh, in that case I suspect it works already, right @Anu-Ra-g : but you can only work on one set of horizons OR one set of timepoints, not both at once? Something like that.

I think you can return an array with multiple dimensions.
I didn't have a strong use for this so struggled to do something general and practical.
For instance if you request by Horizon, you can provide multiple horizon axis and you dimensions should include 'horizon' and 'valid_time". Similarly you can request multiple runtimes and then your dimensions should include 'runtime' and 'step'.
Honestly not sure if this is helpful or over complicated.

@martindurant I tried it out with one set of horizons with the original code. Actually, I'm still figuring out the reinflating part of the code, aggregation types and the new indexes.

I noticed that for reinflating can also work with a grib_tree model made from a single grib file.
@emfdavid can you confirm this in this notebook that I made?

martindurant · 2024-08-30T15:00:20Z

Let me know when this PR is ready for another look.

Anu-Ra-g · 2024-08-30T15:06:51Z

@martindurant I've made the changes like you've suggested. It is ready for review.

docs/source/reference_aggregation.rst

Anu-Ra-g added 2 commits August 27, 2024 21:05

Added a new page for faster aggregations

1f40cf5

updated the description

6eaf758

Anu-Ra-g changed the title ~~Added a new page for faster aggregations~~ Added a new documentation page for faster GRIB aggregations Aug 27, 2024

added the presentation link

1c7a15b

martindurant reviewed Aug 27, 2024

View reviewed changes

updated according to suggestions

29e3ede

emfdavid reviewed Aug 28, 2024

View reviewed changes

martindurant added the GSoC-2024 label Aug 28, 2024

Anu-Ra-g added 3 commits August 30, 2024 11:00

made the suggested changes

8088448

made some refactoring

274b1d5

added some other details

73c040b

martindurant reviewed Aug 30, 2024

View reviewed changes

docs/source/reference_aggregation.rst Show resolved Hide resolved

martindurant merged commit af4c5dd into fsspec:main Sep 9, 2024
5 checks passed

Anu-Ra-g mentioned this pull request Sep 13, 2024

Proposal to add all-contributors bot to Kerchunk #503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a new documentation page for faster GRIB aggregations #495

Added a new documentation page for faster GRIB aggregations #495

Anu-Ra-g commented Aug 27, 2024

martindurant Aug 27, 2024

Anu-Ra-g Aug 28, 2024

emfdavid Aug 28, 2024

martindurant Aug 27, 2024

Anu-Ra-g Aug 27, 2024

martindurant Aug 30, 2024

emfdavid left a comment

emfdavid Aug 28, 2024 •

edited

Loading

martindurant Aug 28, 2024

emfdavid Aug 28, 2024

martindurant Aug 28, 2024

emfdavid Aug 28, 2024

Anu-Ra-g Aug 28, 2024

martindurant commented Aug 30, 2024

Anu-Ra-g commented Aug 30, 2024

		The index in ``idx`` file indexes the GRIB messages where as the ``k_index`` (kerchunk index)
		we build as part of this workflow index the variables in those messages.

Added a new documentation page for faster GRIB aggregations #495

Added a new documentation page for faster GRIB aggregations #495

Conversation

Anu-Ra-g commented Aug 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emfdavid left a comment

Choose a reason for hiding this comment

emfdavid Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martindurant commented Aug 30, 2024

Anu-Ra-g commented Aug 30, 2024

emfdavid Aug 28, 2024 •

edited

Loading