Recipe yml structure #3

BSchilperoort · 2023-04-06T08:47:41Z

I have added an example recipe structure to the repository:

download:
  folder: /home/bart/Data/lsmdata/test/
  years: [1980, 2020]
  bbox: [3, 50, 6, 54]

  datasets:
    era5-land:
      frequency: hourly
      variables:
        - air_temperature  # will map to 2m_temperature...
          - height_m: 2  # optional extra argument
        - dewpoint_temperature
          - height_m: 2

convert:
  standard: ALMA
  flavor: PLUMBER2  # More specified than ALMA.
  folder: /home/bart/Data/lsmdata/output/
  frequency: 1H  # outputs at 1 hour frequency. Pandas-like freq-keyword.
  resolution: 0.01  # output resolution in degrees.

Any thoughts, @SarahAlidoost, @geek-yang ?

SarahAlidoost · 2023-04-06T14:06:37Z

@BSchilperoort Thanks, it looks good, and it has the minimum required information. I think the sections can be reorganized. For example, I suggest using the same structure as springtime recipes for the 'datasets' section. Let's have a separate section for configurations. In the future, other configs like system settings can be added. I like having a documentation part as well. (see esmvaltool recipes and config-user for more). Here is my suggestion:

configurations:
  run_directory: /home/bart/Data/lsmdata/test
  download: True # /home/bart/Data/lsmdata/test/download_dir will be created

documentation:
  description:
    Example recipe that downloads two variables from era5_land data and converts
    them to ALMA format.

datasets:
  test:
    dataset: era5-land
      frequency: hourly
      years: [1980, 2020]
      area:
        name: test
        bbox: [3, 50, 6, 54]
      variables:
        - air_temperature  # will map to 2m_temperature...
          - height_m: 2  # optional extra argument
        - dewpoint_temperature
          - height_m: 2

converter: # /home/bart/Data/lsmdata/test/processed will be created
  convention: ALMA
  flavor: PLUMBER2  # More specified than ALMA.
  frequency: 1H  # outputs at 1 hour frequency. Pandas-like freq-keyword.
  resolution: 0.01  # output resolution in degrees.

BSchilperoort · 2023-04-06T14:30:36Z

Thanks for the ideas. I do like the documentation part.

I find having "datasets" and "dataset" a bit confusing. How about calling the first one a collection?

Additionally, as the goal is to prepare input data for land surface models, the area and years will be the same for most datasets. So by default the area and years should be set on the collection level, with the possibility of deviating from this for a specific dataset. For example:

collections:
  stemmus_scope_NL:
    years: [1980, 2020]
    area: [3, 50, 8, 54]
  
    dataset: era5-land
      frequency: hourly
      variables:
        - air_temperature

    dataset: CAMS
      years: [2004, 2020]  # overrides 'years' from collection level
      variables:
        - co2

    dataset: dummy_data
      years: [1980, 2003] # No data available for these years.
      variables:
        co2:
          unit: ppm
          value: 350

geek-yang · 2023-04-06T14:39:10Z

I find having "datasets" and "dataset" a bit confusing. How about calling the first one a collection?

Or maybe more specific, "datasets" and "source"?

BSchilperoort · 2023-04-06T15:00:19Z

Well, they are also source datasets. As we're making a superset of those, the term "collection" feels most apt to me. Or only "collections" and "sources" to avoid the word altogether.

But it would probably be best to avoid calling the result a new dataset, as the result should not be shared. Redistribution will probably violate some of the license agreements etc. We should be careful with licenses and properly attributing the sources, see also #4

sverhoeven · 2023-04-06T15:02:25Z

What about catalog as a container of datasets. See https://schema.org/DataCatalog

BSchilperoort added the brainstorming label Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recipe yml structure #3

Recipe yml structure #3

BSchilperoort commented Apr 6, 2023

SarahAlidoost commented Apr 6, 2023

BSchilperoort commented Apr 6, 2023 •

edited

Loading

geek-yang commented Apr 6, 2023

BSchilperoort commented Apr 6, 2023

sverhoeven commented Apr 6, 2023

Recipe yml structure #3

Recipe yml structure #3

Comments

BSchilperoort commented Apr 6, 2023

SarahAlidoost commented Apr 6, 2023

BSchilperoort commented Apr 6, 2023 • edited Loading

geek-yang commented Apr 6, 2023

BSchilperoort commented Apr 6, 2023

sverhoeven commented Apr 6, 2023

BSchilperoort commented Apr 6, 2023 •

edited

Loading