Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERA5 data inputs #3

Open
chunhsusu opened this issue Oct 27, 2022 · 16 comments
Open

ERA5 data inputs #3

chunhsusu opened this issue Oct 27, 2022 · 16 comments

Comments

@chunhsusu
Copy link

Request implementation to allow use with ERA5 data in rt52:

  • hourly screen temperature data in /g/data/rt52/era5/single-levels/reanalysis/2t/$year/2t_*.nc (variable name is t2m in units of K)
  • hourly total precipitation data in /g/data/rt52/era5/single-levels/reanalysis/mtpr/$year/mtpr_*.nc (variable name is mtpr in kg m**-2 s**-1units of )

I assume icclim does not handle hourly dat, so an additional argument can be passed to run_icclim.py, which tells xr.dataarray how to preprocess the hourly data first such as resample('1D').mean() etc.

@DamienIrving
Copy link
Member

DamienIrving commented Oct 28, 2022

Happy to implement this.

I guess for temperature we'll need to implement the option of picking resample('1D').mean() for indices that require daily mean temperature, resample('1D').max() for indices that require daily maximum temperature and resample('1D').min() for indices that require daily minimum temperature?

@chunhsusu
Copy link
Author

That's right. Note that for era5 data, they have flipped the order of latitude, and the longitude goes from -180 to 180. I don't think it affects your implementation, but something to worry about when we compare between data sets.

@DamienIrving
Copy link
Member

@chunhsusu - One more question. Do you want to calculate indices for the entire globe using the ERA5 data or just the Australian region? (If the latter I can add spatial subsetting functionality.)

@chunhsusu
Copy link
Author

@DamienIrving For efficiency, yes I think the functionality should be added. Our evaluation work will only be over the Australian region.

@DamienIrving
Copy link
Member

Subsetting option added: e9a6985

@ngben
Copy link
Contributor

ngben commented Nov 17, 2022

Hi @DamienIrving, thanks for implementing this, I have some questions.

  1. How does time_agg apply for bivariate indices?
  2. Does resample take into account the the required shift in the hourly data? e.g. 2021-01-01 00:00 is actually for 2020-12-31, as mentioned in the ECMWF guide in point 3
  3. Also what's the difference between tp and mtpr? ECMWF provides this guide which I have used before to calculate daily rainfall using tp.

@chunhsusu
Copy link
Author

Hi @ngben , On tp v mtpr. We are using the mtpr data archive in /g/data/rt52/era5/single-levels/reanalysis/mtpr/, and there the mtpr is defined as "Mean total precipitation rate". For computing daily total, it makes sense.

How is "tp" defined? rt52 does not seem to have it.

@ngben
Copy link
Contributor

ngben commented Nov 17, 2022

Hi @chunhsusu

How is "tp" defined? rt52 does not seem to have it.

tp is defined as total precipitation (https://apps.ecmwf.int/codes/grib/param-db/?id=228)

I guess if using mtpr the time_agg in run_icclim.py would be "mean"?
For tp when using cdo to calculate the daily amount I used cdo daysum

I downloaded hourly tp last year, I'm not sure why rt52 doesn't have it

@DamienIrving
Copy link
Member

DamienIrving commented Nov 17, 2022

@ngben:

  1. For the bivariate indices you'll need to use the time_agg flag twice. For example, diurnal temperature range for ERA5 data would look like the following with the --input_files, --variable and --time_agg flags used twice, the first time with the information needed to create the tmax data and the second time to create the tmin data:
/g/data/xv83/dbi599/miniconda3/envs/icclim/bin/python run_icclim.py dtr /g/data/ia39/australian-climate-service/test-data/CORDEX-CMIP6/indices/AUS-gn/none/ECMWF-ERA5/evaluation/r1i1p1f1/none/none/climdex/dtr/dtr_AUS-gn_ECMWF-ERA5_evaluation_r1i1p1f1_year_195901-202112.nc --input_files /g/data/rt52/era5/single-levels/reanalysis/2t/*/*.nc --variable t2m --time_agg max --input_files /g/data/rt52/era5/single-levels/reanalysis/2t/*/*.nc --variable t2m --time_agg min --start_date 1959-01-01 --end_date 2021-12-31 --lon_bnds 111.975 156.275 --lat_bnds -44.525 -9.975 --verbose
  1. The resampling doesn't take into account any required shift in the time values. We could implement an --hour_shift option? What shift needs to be applied?

@DamienIrving DamienIrving reopened this Nov 17, 2022
@ngben
Copy link
Contributor

ngben commented Nov 17, 2022

Thanks @DamienIrving

  1. An hourly shift would need to be applied, e.g. when using cdo the command is as follows: cdo daysum -shifttime,-1hour

Also I'm not sure if it's an issue but ERA5 is in short data format which can cause problems when concatenating as the offsets and scale factors may be different between files. For cdo this requires the use of -b F64

@ngben
Copy link
Contributor

ngben commented Nov 17, 2022

A quick glance through the ECMWF ERA5 wiki suggests that 2t/t2m is instantaneous but mx2t and mn2t are the maximum/minimum "since previous post-processing" https://confluence.ecmwf.int/display/CKB/ERA5%3A+2+metre+temperature

I think mx2t and mn2t would need to be shifted along with tp but I don't know if mtpr also needs it.

@DamienIrving
Copy link
Member

I've added a --hshift option that moves the time axis back 1 hour: 6924362

In terms of different scale and offset factors for different files, I'm pretty sure the mask_and_scale=True argument that is passed toxr.open_mfdataset and xr.open_dataset in the script handles that.

@ngben
Copy link
Contributor

ngben commented Nov 17, 2022

Thanks Damien!

For mtpr it should also be shifted as it's for the previous hour

reanalysis: accumulations are over the hour (the accumulation/processing period) ending at the validity date/time

https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Meanrates/fluxesandaccumulations

@chunhsusu
Copy link
Author

@ngben @DamienIrving thank you for raising this. I have been using mtpr and 2t in the icclim calculation. It appears that I should use mx2t and mn2t instead. I can re-run icclim on ERA5, i.e., TX based (TN-based) indicators with mx2t (mn2t) and with --hshift. And precip indicator with mtpr with --hshift.

Please advise if that sounds correct?

@ngben
Copy link
Contributor

ngben commented Nov 18, 2022

@ngben @DamienIrving thank you for raising this. I have been using mtpr and 2t in the icclim calculation. It appears that I should use mx2t and mn2t instead. I can re-run icclim on ERA5, i.e., TX based (TN-based) indicators with mx2t (mn2t) and with --hshift. And precip indicator with mtpr with --hshift.

Please advise if that sounds correct?

Perhaps it might be best to use 2t for tx/tn? ERA5 documentation states:

Given this inconsistency in these three parameters on the CDS, we recommend, in general, that the hourly (analysed) "2 metre temperature" be used to construct the minimum and maximum over longer periods, such as a day.

I think --hshift should be used for mtpr

@chunhsusu
Copy link
Author

@ngben @DamienIrving thank you for raising this. I have been using mtpr and 2t in the icclim calculation. It appears that I should use mx2t and mn2t instead. I can re-run icclim on ERA5, i.e., TX based (TN-based) indicators with mx2t (mn2t) and with --hshift. And precip indicator with mtpr with --hshift.
Please advise if that sounds correct?

Perhaps it might be best to use 2t for tx/tn? ERA5 documentation states:

Given this inconsistency in these three parameters on the CDS, we recommend, in general, that the hourly (analysed) "2 metre temperature" be used to construct the minimum and maximum over longer periods, such as a day.

I think --hshift should be used for mtpr

Thank you Ben, I will follow this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants