Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ERA5 data ingestion #11

Merged
merged 25 commits into from
Jul 31, 2023
Merged

Implement ERA5 data ingestion #11

merged 25 commits into from
Jul 31, 2023

Conversation

geek-yang
Copy link
Member

@geek-yang geek-yang commented Jul 11, 2023

This PR adds ERA5 to the dataset collection:

  • CDS API download
  • ERA5 data downloader
  • ERA5 data ingestion
  • Add converter for ERA5 data and update ALMA convention
  • Add tests

This PR also closes #12.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@geek-yang geek-yang marked this pull request as ready for review July 16, 2023 13:39
@geek-yang
Copy link
Member Author

Note that the failing tests on windows are almost the same as #10. It will be fixed in #10 first. To review this PR, please just ignore it.

Copy link
Member Author

@geek-yang geek-yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To review this PR, the reviewer can simply test the functionality of the code using the demo notebook era5_dataset_demo.ipynb.

Also, it is noteworthy to mention two things:

  • The implementation only concerns the variables needed for running stemmus-scope model (got from https://github.com/EcoExtreML/STEMMUS_SCOPE_Processing/blob/main/global_data/downloading_global_data.md). Given the wide range of fields available in ERA5 and our future plan to make the tool more generic, more variables need to be added to the reference variable list later.
  • A missing license agreement will trigger an error during downloading (see the example below). Since via cdsapi the error message is very clear, it is not necessary to put an extra layer (error message catcher) in zampy to handle this.

e.g. for downloading the land-cover data from cds without accepting the agreement:

Exception: Client has not agreed to the required terms and conditions. To access this resource, you first need to accept the termsof 'ESA CCI licence' at https://cds.climate.copernicus.eu/cdsapp/#!/terms/satellite-land-cover.

Copy link
Contributor

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yang, nice work! It works well for most variables, and the code and tests look quite clean.

I tried downloading the "surface_thermal_radiation" variable, but the conversion is broken. The name is not consistent in the ALMA convention definition "surface_thermal_radiation" vs "net_longwave_radiation".

Additionally, the ERA5 values are (frustratingly) accumulated values. In PyStemmusScope we use the ERA5-land variables which are not accumulated, but fluxes.

I think deaccumulation might be simplest using np.diff, and a xr.where(ds.time.dt.hour==0, ...) operation to get the 00:00 data correct. This can be done during ingestion.

src/zampy/datasets/utils.py Outdated Show resolved Hide resolved
src/zampy/datasets/utils.py Outdated Show resolved Hide resolved
src/zampy/datasets/utils.py Show resolved Hide resolved
src/zampy/datasets/utils.py Show resolved Hide resolved
src/zampy/reference/variables.py Outdated Show resolved Hide resolved
src/zampy/reference/variables.py Outdated Show resolved Hide resolved
tests/test_datasets/test_era5.py Outdated Show resolved Hide resolved
@geek-yang
Copy link
Member Author

Hi Yang, nice work! It works well for most variables, and the code and tests look quite clean.

Thanks for your review @BSchilperoort !

I tried downloading the "surface_thermal_radiation" variable, but the conversion is broken. The name is not consistent in the ALMA convention definition "surface_thermal_radiation" vs "net_longwave_radiation".

I think "surface_thermal_radiation" vs "net_longwave_radiation" are two different things ("net_longwave_radiation" is the net radiation between downward and upward ones, while "surface_thermal_radiation" is the downward one only). But I have added the corresponding "surface_thermal_radiation" and "surface_solar_radiation" variables to the ALMA convention list. Thanks for spotting this.

Additionally, the ERA5 values are (frustratingly) accumulated values. In PyStemmusScope we use the ERA5-land variables which are not accumulated, but fluxes.

I think deaccumulation might be simplest using np.diff, and a xr.where(ds.time.dt.hour==0, ...) operation to get the 00:00 data correct. This can be done during ingestion.

Follow our discussion, these surface radiation variables are instantaneous (😌 luckily), therefore we don't need to be bothered by the "decumulation“.

Copy link
Contributor

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yang, just some relatively small comments from me :)

If you have any questions just let me know.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/zampy/conventions/ALMA.json Outdated Show resolved Hide resolved
src/zampy/conventions/ALMA.json Outdated Show resolved Hide resolved
src/zampy/conventions/ALMA.json Outdated Show resolved Hide resolved
src/zampy/datasets/era5.py Outdated Show resolved Hide resolved
src/zampy/reference/variables.py Show resolved Hide resolved
tests/test_datasets/test_era5.py Outdated Show resolved Hide resolved
src/zampy/datasets/era5.py Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Jul 31, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

97.5% 97.5% Coverage
0.0% 0.0% Duplication

@geek-yang geek-yang merged commit d0fec00 into main Jul 31, 2023
14 checks passed
@geek-yang geek-yang deleted the implement-era5-ingestion branch July 31, 2023 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File size check before downloading via cdsapi
2 participants