Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a list of models and tools that can be used to test workflow managers #197

Open
13 tasks
kinow opened this issue Apr 8, 2023 · 1 comment
Open
13 tasks
Labels

Comments

@kinow
Copy link
Owner

kinow commented Apr 8, 2023

The idea here is to find at least a couple, maybe three or four, models and/or tools that can be used to create the same workflow in Cylc, ecFlow, Autosubmit, Steep WMS (cyclic), StreamFlow (cyclic w/ CWL dev loops), etc., and in the process take notes of what can be improved in each workflow manager.

At the same time, one of these will be used to produce RO-Crates and validate the Autosubmit RO-Crate implementation, and it will be uploaded to WorkflowHub.eu (ResearchObject/ro-crate-py#148).

The notes about the workflow implementation in different WMSs may be useful to find features that are missing or that could be improved in these WMSs, and at the same time provide a resource for the maintainers of these WMSs if they choose to support different cases (i.e. some WMSs may not be suitable for climate models with ensembles that require restarting/re-running, or to run NWP models with cyclic & with critical operational needs), or if they decide to support RO-Crate.

Requirements

  • Public and open source
  • Publicly available input data
  • Produces output in some format like GRIB, NetCDF, plots
  • Can be automated by a WMS (compiling, calibrating, running, or starting containers)
  • The workflow produced can be uploaded to WorkflowHub

Bonus points for the use case that:

  • Can be used to produce a simple RO-Crate with Autosubmit
  • Requires chunking and the use of start-dates (to have a more interesting workflow graph with Autosubmit)
  • Can be easily replicated in another WMS

Models and tools

Wave models

  • ecmwf-ifs/ecwam
    • The ECMWF Ocean Wave Model (ecWAM) describes the development and evolution of wind generated surface waves and their height, direction and period. ecWAM is solely concerned with ocean wave forecasting and does not model the ocean itself: dynamical modelling of the ocean can be done by an ocean model such as NEMO.
  • NOAA-EMC/WW3
    • WAVEWATCH III® is a community wave modeling framework that includes the latest scientific advancements in the field of wind-wave modeling and dynamics.

Earth System models

  • E3SM-Project/E3SM
    • E3SM is a state-of-the-art fully coupled model of the Earth's climate including important biogeochemical and cryospheric processes. It is intended to address the most challenging and demanding climate-change research problems and Department of Energy mission needs while efficiently using DOE Leadership Computing Facilities.

Hydrology

  • UFZ/mhm
    • mHM is based on accepted hydrological conceptualizations and is able to reproduce as accurately as possible not only observed discharge hydrographs at any point within a basin but also the distribution of soil moisture among other state variables and fluxes. To achieve these goals and to ensure a reliable performance in ungauged basins, this model employs a multiscale parameter regionalization technique to obtain effective at the scale of interest.

Software related to models

  • NCAR/PyCECT
    • The Community Earth System Model Ensemble Consistency Test (CESM-ECT) suite is an alternative to requiring bitwise identical output for quality assurance. This objective test provides a statistical measurement of consistency between an accepted ensemble and a test set of CESM simulations.

Links

  • pangeo-data/awesome-open-climate-science
    • This is a curated list of open source software packages that make our lives as scientists, hackers and data wranglers easier or just more awesome. This list is intended to be the fluid-earth counterpart of awesome open geoscience, although there is inevitably some overlap. It is not just climate science! We use the word "climate" in the repo name just as shorthand for the fluid part of the earth. Packages from atmospheric science, oceanography, climate science, and hydrology are all welcome.

RO-Crates

While integrating these models and tools into workflows for different workflow managers, it's possible to take notes on how easy would be for these workflows to be archived as an RO-Crate.

It's clear now that:

  1. Some workflow managers won't have all the necessary (or useful, like authors) data in their configuration and might require extra work to get that information into crates
    1.1. That can be solved now with a custom JSON file containing entries compatible with the JSON-LD used to add/update entries in the RO-Crate file - Add methods for adding and updating JSON-LD directly (partials for WMS) ResearchObject/ro-crate-py#149
  2. Some workflow managers won't have a list of inputs and outputs used in the process encapsulated by the workflow (e.g. Cylc, ecFlow, Autosubmit).
    2.1. In cases like this, the approach above might be useful when combined with entries that provide a list of inputs/outputs, maybe using glob patterns like **/*.nc.
    2.2. It might be hard or nearly impossible to use BioSchemas FormalParameters as CWL/Galaxy/StreamFlow (these mainly rely on CWL, I think): Document how to create a Workflow Run Crate file ResearchObject/ro-crate-py#148 (comment). So in these cases we can just have a list of inputs & outputs as File and Dataset.
@kinow kinow changed the title Create a list of models that can be used to test workflow managers Create a list of models and tools that can be used to test workflow managers Apr 8, 2023
@kinow kinow added the question label Apr 8, 2023
@kinow
Copy link
Owner Author

kinow commented Apr 8, 2023

mHM

Will start with mHM since it has great docs, the source is simple & clear, and not being a complete ESM coupled model it should be easier to run it (:crossed_fingers:). They provide two “test domains” that can be executed after mhm is installed and produce some netcdf files that can be plotted with ncview.

So for the RO-Crate file, maybe an Autosubmit + mHM workflow could work. It'd be better if the workflow also prepared data for mHM based on the selected days for the workflow, thus using at least start dates in Autosubmit (no chunking, but not a blocker, I think).

The easiest test scenario would be somewhere in Germany or Europe (as the data mentioned comes from EU agencies). But maybe it'd be possible to use somewhere else like Tamana-shi, Kumamoto, Japan, or Noumea, New Caledonia (or these two).


2023-04-08

The GIS data preparation step is a bit hard to follow, especially if ArcGIS Map is really needed (would be easier with QGIS). So creating the data for another basin looks like a task that demands more time than a few hours every other weekend. Let's see if there's some data ready to be used, and that can be used with different days.


2023-04-09

So; using their test domains, the mhm.nml file has "periods". One appears to be for the training, and the other one for running the model (inference?). The training period must be within the domain of the input data (1980 to 2000, but I think only 1990-2000 can be used).

That can be used, then, to create a workflow that takes as input the dates for these periods (or maybe just for running the model). The output of the workflow would be the outputs of the mHM model (netcdf files and another txt file). Perhaps we could also have an extra task to run ncview and export a plot, also used as output.

All of this can be packed as an RO-Crate (without using FormalParameters), and it should run on any of these WMSs.


2023-04-11

Created a repository for an Autosubmit workflow to run mHM: https://github.com/kinow/auto-mhm-test-domains

It includes the test domain data from 5.12.0, but that will be replaced by a task that clones the repository for v5.12.0 instead, to avoid including data with different license into the git repo. This will be a good test for an RO-Crate with an Autosubmit Project of type Git (that needs to be an input in the workflow).

The LOCAL_SETUP part of the workflow is complete, will continue tomorrow between meetings. But it's looking good, probably a good example for RO-Crate (and for the automated documentation, future feature).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant