-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add zonal_stats #52
ENH: add zonal_stats #52
Conversation
Thanks! I'll have a look later but note that you need to add new deps to https://github.com/xarray-contrib/xvec/tree/main/ci |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #52 +/- ##
==========================================
+ Coverage 99.06% 99.20% +0.13%
==========================================
Files 3 4 +1
Lines 321 377 +56
==========================================
+ Hits 318 374 +56
Misses 3 3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, a first pass, so far without running the code. A few notes below and some others in the code.
Do we need dask
here? I mean, optimally we make use of it if available but don't require it for smaller use cases.
If possible, I would like to avoid storing data in the repository here. We can use geodatasets
to fetch us counties and xarray example data to get some raster cubes.
Can you ensure that pre-commit checks pass?
I'll go through the notes and work on them. |
Hey, I went through the notes. Now the code passed the pre-commit. And I made it an optional to use dask and number of jobs if the data is big as the following:
And I think it's possible to use geodatasets and xarray example data to get some raster cubes instead of storing the data in the repo. I'll look into that and let u know. |
I've tested it using counties fetched from geodatasets and 'air_temperature' data cube. It works but the order of (x, y ) dimensions is not consistent with what implemented now (similar to data cube structure in openEO) so I had to modify the order manually. We can add function to make sure the dimension order is as expected. What do u think ?
|
In |
I'll take a look at that. But for now I made (lon, lat) as default dims names to aggregate over them. And added a function to rename any other possible names (in a dictionary) to (lon, lat) to be consistent with the function requirement. So, now it works for both data cubes from openEO and data cubes from xarray. |
Co-authored-by: Martin Fleischmann <[email protected]>
…into spatial_aggregation
Thanks for your review and feedback. I reviewed the suggestions and made the necessary modifications to the code accordingly. |
Hi , I've adjusted the code to be more flexible and dimension-agnostic, as you can see below. I've just hard-coded the name of one coordinate as 'data_variables' to save the data variable names temporary, but we can make it a user input. However, I believe this is not critical, as it's an internal step that doesn't affect the output, unless we intend to have the output as an xarray.DataArray. |
Hey, I did a deep dive into this which resulted in refactoring of the code. I opted to use more xarray logic on the process than before, which simplifies the code quite a bit. I also noticed that the implementation was providing wrong results when I compared it to the one based on I have removed chunking as I believe that is something a user can do if they run into issues but it was more of a bottleneck in most use cases I tried. Plus some other changes you can see in the code. I plan on adding tests etc and then merge. I will follow-up with the geocube version, user guide notebook and a release. |
Looking at it in more detail, we can probably use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to merge this now and will continue in separate PRs with the remaining stuff.
@masawdah thanks for starting this, my refactor was way easier than if I'd be starting from scratch! Let me know if there are some specific requirements on the openEO side you'd like to see included before a release.
Sounds great, looking forward to the new release :) |
Thanks for providing this valuable package. The idea of this PR to aggregate raster data cube over a set of polygon geometries #35 as it needed to use within openEO process Open-EO/openeo-processes-dask#124 . It is based on dask array and parallelization which make it efficient and fast. It has been tested to aggregate polygons over data on continental level and shows a good performance.
I'd like to know your thoughts about that.
Here is an example: