diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index d6b009be..6c070d37 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -17,4 +17,4 @@ jobs: - name: Deps run: pip install -r requirements.txt - name: Lint Ansible Playbook - run: ansible-lint --profile min --force-color research-cloud-plugin.yml shared-data-disk.yml + run: ansible-lint --profile min --force-color research-cloud-plugin.yml populate-samba.yml diff --git a/.gitignore b/.gitignore index a76531b1..053c80ae 100644 --- a/.gitignore +++ b/.gitignore @@ -15,3 +15,4 @@ env research-cloud-plugin.vagrant.vars jupyterhub.launcher.token launcher.jwt.secret +.venv \ No newline at end of file diff --git a/DATA.md b/DATA.md new file mode 100644 index 00000000..cb05536a --- /dev/null +++ b/DATA.md @@ -0,0 +1,121 @@ +# Shared data +- [Shared data](#shared-data) + - [Configured paths](#configured-paths) + - [Populating Samba file server](#populating-samba-file-server) + - [Populating dcache](#populating-dcache) + - [Sync dcache with existing folder elsewhere](#sync-dcache-with-existing-folder-elsewhere) + - [Mount dcache on local machine](#mount-dcache-on-local-machine) + +This document is dedicated for application data preparer. + +## Configured paths + +The eWatercycle Python package (`/etc/ewatercycle.yaml`) and ESMValTool (`~/.esmvaltool/config-user.yml`) have been configured to use the following paths: + +- Root is `/data/shared` + - Climate data is in `/data/shared/climate-data`, used to generated forcings. + - ESMValTool auxiliary data is in `/data/shared/climate-data/aux` + - OBS6 data is in `/data/shared/climate-data/obs6` + - Parameter sets are in `/data/shared/parameter-sets`, used to run models. + - Apptainer images are in `/data/shared/singularity-images`, used to run containerized models. + - GRDC observations are in `/data/shared/observation/grdc/dailies`, used `ewatercycle.observation.grdc.get_grdc_data()` function. + +## Populating Samba file server + +Populating the `/data/volume_2/samba-share/` directory on the Samba file server can be done with a Ansible playbook using the following commands. + + +```shell +sudo -i +git clone -b dcache-or-samba https://github.com/eWaterCycle/infra.git /opt/infra +cd /opt/infra +ansible-galaxy role install mambaorg.micromamba +# Playbook will run for a long time, so run it in a detachable shell +screen +# Get cds user id (uid) and api key from cds profile page +ansible-playbook populate-samba.yml -e cds_uid=... -e cds_api_key=... +# If you do not want to download ERA5 data then leave out cds_uid and cds_api_key arguments. +# Detach screen with Ctrl+A, D +# Reattach screen with screen -r +``` + +This will: +0. Harden the share, so only root can write in /data/volume_2/samba-share/ and its readonly +1. Download Apptainer images for models +3. Setup era5cli to download era5 data +5. Download raw era5 data with era5cli +6. Aggregate, cmorize and compress era5 data with custom esmvaltool script +7. Setup rclone for copying data from dcache to file server +8. Create a ewatercycle.yaml which can be used on the Jupyter machines. + 1. Create empty directory `/data/shared/observation/grdc/dailies` where GRDC data can be stored. +9. Create a esmvaltool config file which can be used on the Jupyter machines. + +If you have data elsewhere you can sync the data with this file server with + +```shell +rsync -av --progress @/ /data/volume_2/samba-share/ +``` + +## Populating dcache + +This chapter is dedicated for application data preparer. + +First gather all your data togethe on a server (like snellius or spider). +You can use parts of the [populate-samba.yml](populate-samba.yml) playbook to download data. + +Populating the dcache can be done from a server (like snellius or spider) +with the following command + +```shell +# cd to directory with data +# have a rclone config with dcache macaroon +rclone copy . dcache:ewatercycle +``` + +## Sync dcache with existing folder elsewhere + +The steps above fetch the data from original sources. If you want to sync some files from +another location, say, Snellius, you can use rclone directly. In our experience, it works +better to sync entire directories than to try and copy single files. + +Create the file `~/.config/rclone/rclone.conf` and add the following content: + +``` +[ dcache ] +type = webdav +url = https://webdav.grid.surfsara.nl:2880 +vendor = other +user = +pass = +bearer_token = +``` + +You can verify your access by running an innocent `rclone ls dcache:parameter-sets`. +The command to sync directories is `rclone copy somedir dcache:parameter-sets/somedir`. +Beware that this will overwrite any existing files, if different! + +Note: password manager can be used for exchanging macaroons. + +## Mount dcache on local machine + +Create the file `~/.config/rclone/rclone.conf` and add the following content: + +```ini +[dcache] +type = webdav +url = https://webdav.grid.surfsara.nl:2880 +vendor = other +user = +pass = +bearer_token = +``` + +Install [rclone](https://rclone.org/) and run following command to mount dcache at `~/dcache` directory. + +```shell +mkdir ~/dcache +rclone mount --read-only --cache-dir /tmp/rclone-cache --vfs-cache-max-size 30G --vfs-cache-mode full dcache:/ ~/dcache +``` + +In ESMValTool config files you can use `~/dcache/climate-data/obs6` for `rootpath:OBS6`. \ No newline at end of file diff --git a/README.md b/README.md index 7ff00651..92908e62 100644 --- a/README.md +++ b/README.md @@ -3,8 +3,25 @@ ![Ansible Lint](https://github.com/eWaterCycle/infra/workflows/Ansible%20Lint/badge.svg) [![Concept DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1462548.svg)](https://doi.org/10.5281/zenodo.1462548) +- [Instructions for system administrators to deploy the eWaterCycle platform](#instructions-for-system-administrators-to-deploy-the-ewatercycle-platform) + - [Setup of eWaterCycle platform on the SURF Research cloud](#setup-of-ewatercycle-platform-on-the-surf-research-cloud) + - [Setup of eWaterCycle platform on a local test VM](#setup-of-ewatercycle-platform-on-a-local-test-vm) + - [SURF Reseach cloud catalog item registration](#surf-reseach-cloud-catalog-item-registration) + - [SURF Research cloud workspace](#surf-research-cloud-workspace) + - [Shared data source](#shared-data-source) + - [Preparations](#preparations) + - [File Server](#file-server) + - [Workspace creation with dcache as stared data source](#workspace-creation-with-dcache-as-stared-data-source) + - [Workspace creation with samba as shared data source](#workspace-creation-with-samba-as-shared-data-source) + - [Students](#students) + - [Example notebooks](#example-notebooks) + - [Docker images](#docker-images) + - [AI Disclaimer](#ai-disclaimer) + This repo contains (codified) instructions for deploying the eWaterCycle platform. The target audience of these instructions are system administrators. For more information on the eWaterCycle platform (and how to deploy it) see the [eWaterCycle documentation](https://ewatercycle.readthedocs.io/). +With grading setup is [one class, one grader](https://nbgrader.readthedocs.io/en/stable/configuration/jupyterhub_config.html#example-use-case-one-class-one-grader). + For instructions on how to use the machine as deployed by this repo see the [User guide](USER.md). These instructions assume you have some basic knowledge of [vagrant](https://vagrantup.com) and @@ -16,246 +33,158 @@ The hardware environment used by the eWaterCycle platform development team is th The setup instructions in this repo will create an eWaterCycle application(a sort-of VM template) that when started will create a machine with: -- Explorer: web visualization of available models / parameter sets combinations and a way to generate Jupyter notebooks - Jupyter Hub: to interactivly generate forcings and perform experiments on hydrological models using the [eWatercycle Python package](https://ewatercycle.readthedocs.io/) + - [nbgrader](https://nbgrader.readthedocs.io/en/stable/) for grading + - [nbgitpuller](https://jupyterhub.github.io/nbgitpuller/) to open a cloned git repository in Jupyter Lab from an [URL](https://nbgitpuller.readthedocs.io/en/latest/link.html) - ERA5 and ERA-Interim global climate data, which can be used to generate forcings - Installed models and their example parameter sets An application on the SURF Research cloud is provisioned by running an Ansible playbook (research-cloud-plugin.yml). -In addition to the standard VM storage, additional read-only datasets are mounted at `/mnt/data` from dCache using rclone. They may contain things like: +In addition to the standard VM storage, additional read-only datasets are mounted at `/data/shared` from a file server like a samba server or a dcache server. They may contain things like: - climate data, see - observation - parameter-sets - singularity-images of hydrological models wrapped in grpc4bmi servers +See [File server chapter](#file-server) for more information on the file server. + Previously the eWatercycle platform consisted of multiple VM on SURF HPC cloud, see [v0.1.2 release](https://github.com/eWaterCycle/infra/releases/tag/v0.1.2) for that code. ## Setup of eWaterCycle platform on a local test VM -Deploying a local test VM is mostly useful for developing the SURF Research Cloud applications. This vagrant setup creates a virtual machine with 8Gb memory, 4 virtual cores, and 70Gb storage. This should work on any Linux or Windows machine. - -To set up an Explorer/Jupyter server on your local machine with [vagrant](https://vagrantup.com) and -[Ansible](https://docs.ansible.com/ansible/latest/index.html) - -Create config file `research-cloud-plugin.vagrant.vars` with - -```yaml ---- -dcache_ro_token: -rclone_cache_dir: /data/volume_2 -# Directory where /home should point to -alt_home_location: /data/volume_3 -``` - -The token can be found in the eWaterCycle password manager. - -```shell -vagrant --version -# Vagrant 2.4.1 -vagrant plugin install vagrant-vbguest -# Installed the plugin 'vagrant-vbguest (0.32.0)' -vagrant up -``` - -Visit site - -```shell -# Get ip of server with -vagrant ssh -c 'ifconfig eth1' -``` - -Go to `http://` and login with `vagrant:vagrant`. - -You will get some complaints about unsecure serving, this is OK for local testing and this will not happen on Research Cloud. - -### Test on Windows Subsystem for Linux 2 - -WSL2 users should follow steps on [https://www.vagrantup.com/docs/other/wsl](https://www.vagrantup.com/docs/other/wsl). - -Importantly: - -- Work on a folder on the windows file system. -- Export VAGRANT_WSL_WINDOWS_ACCESS_USER_HOME_PATH="/mnt/c/.../infra" -- `export PATH="$PATH:C:\Program Files\Oracle\VirtualBox"` -- ` vagrant up --provider virtualbox` -- Approve the firewall popup - -## Catalog item registration - -This chapter is dedicated for catalog item developers. - -On the Research cloud the [developer](https://servicedesk.surf.nl/wiki/display/WIKI/Appoint+a+CO-member+a+developer) can add an catalog item for other people to use. -The generic steps to do this are documented [here](https://servicedesk.surf.nl/wiki/display/WIKI/Create+your+own+catalog+items). - -For eWatercycle component following specialization was done - -- Use Ansible playbook as component script type - - Use `https://github.com/eWaterCycle/infra.git` as repository URL - - Use `research-cloud-plugin.yml` as script path - - Use `main` as tag -- Component parameters, all fixed source type and non-overwitable unless otherwise stated - - Add `dcache_ro_token` parameter for dcache read-only token aka macaroon. - The token can be found in the eWaterCycle password manager. - This token has an expiration date, so it needs to be updated every now and then. - - Add `alt_home_location` parameter with value `/data/volume_2`. - For mount point of the storage item which should hold homes mounted. - - Add `rclone_cache_dir` parameter with value `/data/volume_3`. - For directory where rclone can store its cache. - - Add `rclone_max_gsize` with value `45`. - For maximum size of cache on `rclone_cache_dir` volume. In Gb. -- Set documentation URL to `https://github.com/eWaterCycle/infra` -- Do not allow every org to use this component. Data on the dcache should not be made public. -- Select the organizations (CO) that are allowed to use the component. - -For eWatercycle catalog item following specialization was done - -- Select the following components: - 1. SRC-OS - 2. SRC-CO - 3. SRC-Nginx - 4. SRC-External plugin - 5. eWatercycle -- Set documentation URL to `https://github.com/eWaterCycle/infra` -- Add `SURF HPC Cloud` as cloud provider - - Set Operating Systems to Ubuntu 22.04 - - Set Sizes to all non-gpu and non-disabled sizes -- In parameter settings step keep all values as is except - - Set `co_irods` to `false` as we do not use irods - - Set `co_research_drive` to `false` as we do not use research drive -- Set boot disk size to 150Gb, - as default size will be mostly used by the conda environment and will trigger out of space warnings. -- Set workspace acces button behavior to `Webinterface (https:)`, - so clicking on `ACCESS` button will open up the eWatercycle experiment explorer web interface -- Select the organizations (CO) that are allowed to use the catalog item. +For developing the SURF Research Cloud applications locally you can use the [Vagrant instructions](VAGRANT.md) -To become root on a VM the user needs to be member of the `src_co_admin` group on [SRAM](https://sram.surf.nl/). -See [docs](https://servicedesk.surf.nl/wiki/display/WIKI/Workspace+roles%3A+Appoint+a+CO-member+a+SRC+administrator). +## SURF Reseach cloud catalog item registration -## SURF Research cloud VM deployment +To register the eWaterCycle platform on the SURF Research cloud, follow instructions in [SURF Research cloud developer document](SRC-DEVEL.md). -This chapter is dedicated for application deployers. +## SURF Research cloud workspace -1. Log into Research Cloud -1. Create new storage item for home directories - - To store user files - - Use 50Gb size for simple experiments or bigger when required for experiment. - - As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together. -1. Create new storage item for cache - - To store cached files from dCache by rclone - - Use 50GB size as size - - As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together. -1. Create a new workspace -1. Select eWaterCycle application -1. Select collaborative organisation (CO) for example `ewatercycle-nlesc` -1. Select size of VM (cpus/memory) based on use case -1. Select home storage item. - - Order in which the storage items are select is important, make sure to select home before cache storage item. -1. Select cache storage item -1. Wait for machine to be running -1. Visit URL/IP -1. When done delete machine +This chapter is dedicated for application deployers. A workspace is name for a Virtual Machine (VM) on the SURF Research cloud. The workspace is created with the eWaterCycle application from the catalog. -For a new CO make sure +### Shared data source -- application is allowed to be used by CO. See [Sharing catalog items](https://servicedesk.surfsara.nl/wiki/display/WIKI/Sharing+catalog+items) -- data storage item and home dir are created for the CO +The [eWatercycle system setup](https://ewatercycle.readthedocs.io/en/latest/system_setup.html) requires a lot of data files. -End user should be invited to CO so they can login. +Two eWaterCycle catalog items have been created: +1. eWaterCycle dcache, uses dcache as shared data source. High capacity, but high latency storage accessible via WebDAV from anywhere on the Internet. Usefull for research. +2. eWaterCycle samba, uses samba as shared data source. A low capaciry, low latency file server that is only accessible from the private network of the SURF Research cloud. Usefull for teaching. -See [User guide](USER.md) to see what users have to do to login or use GitHub repository. +The shared data is mounted read-only `/data/shared` on the workspaces. +In the following chapters you will need to make choose which catalog item you want to use. +Depending on the choice, you need to do certain things. -### Example notebooks +### Preparations -To get example notebooks end users should use following URL (with `` with your currently running workspace) +Before you can create a workspace several steps need to be done first. -```html -https://.workspaces.live.surfresearchcloud.nl/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FeWaterCycle%2Fewatercycle&urlpath=lab%2Ftree%2Fewatercycle%2Fdocs%2Fexamples%2FMarrmotM01.ipynb&branch=main -``` +1. Log into [SURF Research Cloud](https://portal.live.surfresearchcloud.nl/) +2. Make sure you are [allowed to use eWaterCycle catalog item](https://servicedesk.surf.nl/wiki/display/WIKI/Sharing+components+and+catalog+items) +3. Create new storage item for home directories + - To store user files + - Use 50Gb size for simple experiments or bigger when required for experiment. + - As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together. +4. If shared data source is dcache then create new storage item for dcache cache + - To store cached files from dCache by rclone + - Use 50GB size as size + - As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together. +5. If shared data source is samba then create new storage item for data + - To store training material like parameter sets, ready-to-use forcings, raw forcings and apptainer sif files for models. + - This storage item should be used later in the Samba file server. +6. If shared data source is samba then create a private network + - Name: `file-storage-network` +7. On https://portal.live.surfresearchcloud.nl/profile page in Collaborative organizations + - Create a secret named `samba_password` and a strong random password as value + - Create a secret named `dcache_ro_token` and a dcache read-only token as value -TODO add this link to home page of server at +To become root on a VM the user needs to be member of the `src_co_admin` group on [SRAM](https://sram.surf.nl/). +See [docs](https://servicedesk.surf.nl/wiki/display/WIKI/Workspace+roles%3A+Appoint+a+CO-member+a+SRC+administrator). -This link uses [nbgitpuller](https://jupyterhub.github.io/nbgitpuller/) to sync a git repo and open a notebook in it. +### File Server -## Fill shared data disk +If you want to create a eWaterCycle machine (aka workspace) that uses a Samba file server (aka shared data source is samba), you need to create a Samba file server first. -This chapter is dedicated for application data preparer. +Each collaborative organization should run a single file server. This file server will be used to store read-only shared data. The file server should be created with the following steps: -The [eWatercycle system setup](https://ewatercycle.readthedocs.io/en/latest/system_setup.html) requires a lot of data files. -For the Research cloud virtual machines we will mount a dcache bucket. +1. Create a new workspace +2. Select `Samba Server` application +3. Select size with 2 cores and 16 GB RAM +4. Select data storage item, created in previous section +5. Select private network +6. Wait for machine to be running +7. Login to machine with ssh + 1. Become root with `sudo -i` + 2. Edit /etc/samba/smb.conf and in `[samba-share]` section replace `read only = no` with `read only = yes` + 3. Restart samba server with `systemctl restart smbd` +8. Populate `/data/volume_2/samba-share/` directory with training material. This directory will be shared with other machines. -To fill the dcache bucket you can run +See [data documentation](DATA.md#populating-samba-file-server) on how to populate the file server. -```shell -ansible-playbook \ - -e cds_uid=1234 -e cds_api_key \ - -e dcache_rw_token= - shared-data-disk.yml -``` +### Workspace creation with dcache as stared data source -Runnig this script will download all data files to /mnt/data and upload them to dcache. +Steps to create a eWaterCycle workspace: -## Sync dcache with existing folder elsewhere +1. Create a new workspace +2. Select collaborative organisation (CO) for example `ewatercycle-nlesc` +3. Select `eWaterCycle dcache` catalog item +4. Select size of VM (cpus/memory) based on use case +5. Select storage item for home directories. Remember item you picked as you will need it in the workspace parameters. +6. Select storage item for dcache cache. Remember item you picked as you will need it in the workspace parameters. +7. Fill **all** the workspace parameters. They should look something like + ![workspace-parameters](workspace-parameters.png) + If you are not interested in grading then the following parameters can be left unchanged: 'Course repository', 'Course version', 'Grader user' and 'Students'. +8. Wait for machine to be running +9. Visit URL/IP +10. When done delete machine + +End user should be invited to Collaborative organization in [SRAM](https://sram.surf.nl/) or [created as students](#students) so they can login. -The steps above fetch the data from original sources. If you want to sync some files from -another location, say, Snellius, you can use rclone directly. In our experience, it works -better to sync entire directories than to try and copy single files. +See [User guide](USER.md) to see what users have to do to login or use GitHub repository. -Create the file `~/.config/rclone/rclone.conf` and add the following content: +### Workspace creation with samba as shared data source -``` -[ dcache ] -type = webdav -url = https://webdav.grid.surfsara.nl:2880 -vendor = other -user = -pass = -bearer_token = -``` +Steps to create a eWaterCycle workspace: -You can verify your access by running an innocent `rclone ls dcache:parameter-sets`. -The command to sync directories is `rclone copy somedir dcache:parameter-sets/somedir`. -Beware that this will overwrite any existing files, if different! +1. Create a new workspace +2. Select collaborative organisation (CO) for example `ewatercycle-nlesc` +3. Select `eWaterCycle samba` catalog item +4. Select size of VM (cpus/memory) based on use case +5. Select home storage item. Remember items you picked as you will need them in the workspace parameters. +6. Select the private network +7. Fill **all** the workspace parameters. They should look something like + ![workspace-parameters](workspace-parameters.png) + If you are not interested in grading then the following parameters can be left unchanged: 'Course repository', 'Course version', 'Grader user' and 'Students'. +8. Wait for machine to be running +9. Visit URL/IP +10. When done delete machine + +End user should be invited to Collaborative organization in [SRAM](https://sram.surf.nl/) or [created as students](#students) so they can login. -Note: password manager can be used for exchanging macaroons. +See [User guide](USER.md) to see what users have to do to login or use GitHub repository. -## Mount dcache on local machine +### Students -Create the file `~/.config/rclone/rclone.conf` and add the following content: +During creation you can set the `students` parameter to create local posix accounts for students. +The format of the parameter value is `:,:`. +Use emtpy string for no students. +Make sure to use strong passwords as anyone on the internet can access the machine. -```ini -[dcache] -type = webdav -url = https://webdav.grid.surfsara.nl:2880 -vendor = other -user = -pass = -bearer_token = -``` +You can use the python script [create_student_passwords.py](create_student_passwords.py) to generate passwords. To use it, create a file "usernames.txt" with one username on each line. Then call the script to generate passwords. They will be stored in a new file called `students.txt`. See docs in script for more details. The passwords generated by the script should be distributed to the students. -Install [rclone](https://rclone.org/) and run following command to mount dcache at `~/dcache` directory. +### Example notebooks -```shell -mkdir ~/dcache -rclone mount --read-only --cache-dir /tmp/rclone-cache --vfs-cache-max-size 30G --vfs-cache-mode full dcache:/ ~/dcache -``` +To get example notebooks end users should goto to the machines homepage and click one of the notebook links. -In ESMValTool config files you can use `~/dcache/climate-data/obs6` for `rootpath:OBS6`. +These links use [nbgitpuller](https://jupyterhub.github.io/nbgitpuller/) to sync a git repo and open a notebook in it. ## Docker images -In the eWaterCycle project we make Docker images. The images are hosted on [Docker Hub](https://hub.docker.com/u/ewatercycle) . A project member can create issues here for permisison to push images to Docker Hub. - -## Logs - -All services are running with systemd. Their logs can be viewed with `journalctl`. -The log of the Jupyter server for each user can be followed with +In the eWaterCycle project we make Docker images. The images are hosted on [Docker Hub](https://hub.docker.com/u/ewatercycle) and [GitHub Container Registry](https://github.com/orgs/eWaterCycle/packages). A project member can create issues here for permisison to push images to Docker Hub or GitHub Container Registry. -```shell -journalctl -f -u jupyter-vagrant-singleuser.service -``` +## AI Disclaimer -(replace `vagrant` with own username) +The documentation/software code in this repository has been generated and/or refined using +GitHub CoPilot. All AI-output has been verified for correctness, +accuracy and completeness, adapted where needed, and approved by the author. diff --git a/SRC-DEVEL.md b/SRC-DEVEL.md new file mode 100644 index 00000000..e6b57dd9 --- /dev/null +++ b/SRC-DEVEL.md @@ -0,0 +1,137 @@ +# SURF Research cloud developer + +This chapter is dedicated for catalog item and component developers. + +- [SURF Research cloud developer](#surf-research-cloud-developer) + - [Component registration](#component-registration) + - [Catalog item registration](#catalog-item-registration) + +A new workspace (aka Virtual Machine) can be made by choosing a catalog item. +A catalog item consists out of a list of components and other configuration. + +To register new catalog items in SURF Research cloud you +need to [appoint a developer](https://servicedesk.surf.nl/wiki/display/WIKI/Appoint+a+CO-member+a+developer). + +The generic steps to make your own catalog item are documented [here](https://servicedesk.surf.nl/wiki/display/WIKI/Create+your+own+catalog+items). + +## Component registration + +On [Components page](https://portal.live.surfresearchcloud.nl/catalog/components) +create a eWatercycle component with following specialization: + +- Component script + - Component script type: Ansible playbook + - Repository URL: https://github.com/eWaterCycle/infra.git + - Path: research-cloud-plugin.yml + - Tag: dcache-or-samba +- Name & description + - Name: eWaterCycle dache or samba + - Subtitle: eWaterCycle teaching platform in a box + - Description: Welcome page + JupyterHub + nbgitpuller + nbgrader + eWaterCycle Python packages + dcache or samba + - Logo: Organization avatar/logo from https://github.com/eWaterCycle/ewatercycle +- Parameters, all configured parameters should be source type is fixed, required and overwitable unless otherwise stated + - shared_data_source: + - description: Source of shared data. Set to `dcache` or `samba`. + - initial value: dcache + - samba_password: + - source_type: Co-Secret + - overwritable: false + - initial value: {"key": "samba_password"} + - dcache_ro_token: parameter for dcache read-only token aka macaroon. + The token can be found in the eWaterCycle password manager. + This token has an expiration date, so it needs to be updated every now and then. + - source_type: Co-Secret + - description: Macaroon with read permission for dcache. + - initial value: {"key": "dcache_ro_token"} + - overwritable: false + - rclone_cache_dir: + - description: Path where rclone cache is stored. Set to `/data/`. + - initial value: /data/volume_3 + - alt_home_location: + - description: Path where home directories are stored. Set to `/data/`. + - initial value: /data/volume_2 + - grader_user: + - description: User who will be grading. User should be created on sram. This user will also be responsible for setting up the course and assignments. + - initial value: ubuntu + - required: false + - students: + - description: List of student user name and passwords. Format ':,:'. Use empty string for no students. Use strong passwords as anyone on the internet can access the machine. + - required: false + - course_repo: + - description: Git repository url with the course source material. + - initial value: https://github.com/eWaterCycle/teaching.git + - course_version + - description: The version, branch or tag of the course repository to use. + - initial value: nbgrader-quickstart +- Owner & support + - Owner: ewatercycle-nlesc + - Documentation URL: https://github.com/eWaterCycle/infra +- Access + - Allow every org to use this component. + +## Catalog item with dcache as shared data source + +On [Catalog items page](https://portal.live.surfresearchcloud.nl/catalog/catalogItems) +create an eWatercycle catalog item with following specialization: + +- Components, select the following components (use live version for all of them): + 1. SRC-OS + 2. SRC-CO + 3. SRC-Nginx + 4. SRC-External plugin + 5. eWaterCycle dache or samba +- Name & description + - Name: eWaterCycle dcache + - Subtitle: eWaterCycle teaching platform in a box + - Description: Welcome page + JupyterHub + nbgitpuller + nbgrader + eWaterCycle Python packages + dcache as shared data source + - Logo: Organization avatar/logo from https://github.com/eWaterCycle +- Owner & support + - Owner: ewatercycle-nlesc + - Documentation URL: https://github.com/eWaterCycle/infra +- Access, Select the organizations (CO) that are allowed to use the catalog item. + - Allowed Collaborative Organisations: Select all organizations with eWaterCycle in the name +- Cloud settings + - Add `SURF HPC Cloud` as cloud provider + - Operating Systems: Ubuntu 22.04 + - Sizes: all non-gpu and non-disabled sizes +- Parameters, keep all values as is except + - Set `co_irods` to `false` as we do not use irods + - Set `co_research_drive` to `false` as we do not use research drive + - Set `shared_data_source` to `dcache` + - As interactive parameters expose following: + - rclone_cache_dir: + - label: Rclone cache directory + - alt_home_location: + - label: Homes path + - grader_user: + - label: Username of grader + - students + - label: Students + - course_repo + - label: Course repository + - course_version + - label: Course version +- Workspace settings + - Set boot disk size to 50Gb, + as default size will be mostly used by the conda environment and will trigger out of space warnings. + - Set workspace acces button behavior to `Webinterface (https:)`, + so clicking on `ACCESS` button will open up the eWatercycle experiment explorer web interface + +## Catalog item with Samba as shared data source + +On [Catalog items page](https://portal.live.surfresearchcloud.nl/catalog/catalogItems) +create an eWatercycle catalog item with following specialization: + +1. Find `eWaterCycle dcache` component item +2. Click on Actions -> Clone +3. Then re-configure the following + +- Name & description + - Name: eWaterCycle samba + - Description: Welcome page + JupyterHub + nbgitpuller + nbgrader + eWaterCycle Python packages + samba as shared data source +- Parameters + - rclone_cache_dir: + - action: keep value + - shared_data_source: + - action: overwrite + - initial value: samba diff --git a/USER.md b/USER.md index a6308ca0..101ddb32 100644 --- a/USER.md +++ b/USER.md @@ -1,5 +1,46 @@ # User guide +- [User guide](#user-guide) + - [Grading](#grading) + - [Students](#students) + - [Create assignment](#create-assignment) + - [Fetch assignment](#fetch-assignment) + - [Logging into server](#logging-into-server) + - [Save notebooks to GitHub](#save-notebooks-to-github) + - [1. Create GitHub account](#1-create-github-account) + - [2. Create GitHub repository](#2-create-github-repository) + - [3. Setup GitHub authentication on server](#3-setup-github-authentication-on-server) + - [4. Clone repository](#4-clone-repository) + - [5. Add \& commit \& push notebooks on server to GitHub](#5-add--commit--push-notebooks-on-server-to-github) + - [6. Pull changes from GitHub to server](#6-pull-changes-from-github-to-server) + - [Install own software](#install-own-software) + - [Jupyter server logs](#jupyter-server-logs) + +## Grading + +One user is the grader on the machine following section is for that user. + +### Students + +Assigning students to courses can be managed with [nbgrader labextension & cli](https://nbgrader.readthedocs.io/en/stable/user_guide/managing_the_database.html#managing-students). + +The student id is a posix username in the VM. +In a ewatercycle VM, users can be added during workspace creation or with [Surf Research Access Management](https://sram.surf.nl/) or later with `sudo adduser`. +After sram invite a cronjob will add the user to the VM. Login with totp as password. +Users added to sram need to be added to nbgrader manually. + +### Create assignment + +1. In menu goto Nbgrader -> Formgrader -> Manage Assignments -> Add new Assignment. +2. Click on edit to navigate to folder (`~/course1/source/`). +3. Create a notebook with assignment annotations using the `Create Assignment` sidebar. +4. In FormGrader press Generate, Preview, Release buttons. + +### Fetch assignment + +1. In menu got Nbgrader -> Assignments +2. Press fetch button + ## Logging into server You should have recieved a invitation to a eWatercycle collaborative organization, please follow instructions in email. @@ -14,12 +55,6 @@ The username you should use to login can be found on on [SRC dashboard profile p The Jupyter notebooks that you write should be saved outside the Jupyter server. Code like notebooks can be saved git repositories on [GitHub](https://github.com/). -- [1. Create GitHub account](#1-create-github-account) -- [2. Create GitHub repository](#2-create-github-repository) -- [3. Setup authenication on server](#3-setup-authenication-on-server) -- [4. Clone repository](#4-clone-repository) -- [5. Add & commit & push notebooks on server to GitHub](#5-add--commit--push-notebooks-on-server-to-github) -- [6. Pull changes from GitHub to server](#6-pull-changes-from-github-to-server) ### 1. Create GitHub account @@ -192,3 +227,15 @@ which python 5. Open a notebook and pick the new kernel that is `testewatercycle` 6. Install additional software + +## Jupyter server logs + +Sometimes something goes wrong in the Jupyter Server, for example you expected to see an error in your notebook but it is not there. + +The log of the Jupyter server can be viewed with + +```shell +journalctl -f -u jupyter-vagrant-singleuser.service +``` + +Replace `vagrant` with own username. diff --git a/VAGRANT.md b/VAGRANT.md new file mode 100644 index 00000000..13ef967c --- /dev/null +++ b/VAGRANT.md @@ -0,0 +1,79 @@ +## Setup of eWaterCycle platform on a local test VM + +Deploying a local test VM is mostly useful for developing the SURF Research Cloud applications. This vagrant setup creates a virtual machine with 8Gb memory, 4 virtual cores, and 70Gb storage. This should work on any Linux or Windows machine. + +To set up an Explorer/Jupyter server on your local machine with [vagrant](https://vagrantup.com) and +[Ansible](https://docs.ansible.com/ansible/latest/index.html) + +Create config file `research-cloud-plugin.vagrant.vars` with + +```yaml +--- +shared_data_source: dcache +# If set to samba you need to run the file server, see next chapter +# shared_data_source: samba +# When using source samba you need to also give the ip of the file server +# This can be retrieved with `vagrant ssh fileserver -c 'ip addr show eth1'` +# private_smb_server_ip: +# - +# When using source samba you need to also give a password +# it is hardcoded in vagrant-provision-file-server.yml to samba +# samba_password: samba +dcache_ro_token: +rclone_cache_dir: /data/volume_2 +# Directory where /home should point to +alt_home_location: /data/volume_3 +# Vagrant user is instructor +# The students defined below can be used to login as a student +students: 'student1:pw1,student2:pw2' +``` + +The token can be found in the eWaterCycle password manager. + +```shell +vagrant --version +# Vagrant 2.4.1 +vagrant up +``` + +Visit site + +```shell +# Get ip of server with +vagrant ssh -c 'ifconfig eth1' +``` + +Go to `http://` and login with `vagrant:vagrant`. + +You will get some complaints about unsecure serving, this is OK for local testing and this will not happen on Research Cloud. + +### Vagrant File server + +The file server can also be tested locally with Vagrant using: + +```shell +vagrant up fileserver +# Pick same bridged network interface for the Jupyter server +vagrant ssh fileserver +``` + +And follow the steps in the [File Server](#populating-samba-file-server) section, but instead of cloning repo you can `cd /vagrant` and run ansible commands. + +To clean up use + +```shell +vagrant destroy fileserver +``` + +### Test on Windows Subsystem for Linux 2 + +WSL2 users should follow steps on [https://www.vagrantup.com/docs/other/wsl](https://www.vagrantup.com/docs/other/wsl). + +Importantly: + +- Work on a folder on the windows file system. +- Install vagrant plugin https://github.com/Karandash8/virtualbox_WSL2 +- `export VAGRANT_WSL_WINDOWS_ACCESS_USER_HOME_PATH=$PWD` +- `export PATH="$PATH:C:\Program Files\Oracle\VirtualBox"` +- `vagrant up --provider virtualbox` +- Approve the firewall popup \ No newline at end of file diff --git a/Vagrantfile b/Vagrantfile index f4b30003..79fe2861 100644 --- a/Vagrantfile +++ b/Vagrantfile @@ -1,37 +1,73 @@ # -*- mode: ruby -*- # vi: set ft=ruby : + Vagrant.configure("2") do |config| - config.vm.box = "bento/ubuntu-22.04" - config.vm.synced_folder ".", "/vagrant" - - # Create a public network, which generally matched to bridged network. - # Bridged networks make the machine appear as another physical device on - # your network. - config.vm.network "public_network" - config.vm.hostname = "ewc-explorer-jupyterhub" - - # Provider-specific configuration so you can fine-tune various - # backing providers for Vagrant. These expose provider-specific options. - # Example for VirtualBox: - # - config.vm.provider "virtualbox" do |vb| - # Customize the amount of memory on the VM: - vb.memory = 8096 - vb.cpus = 4 - end + config.vm.define "jupyter", primary: true do |jupyter| + config.vm.box = "bento/ubuntu-22.04" + config.vm.synced_folder ".", "/vagrant" + + # Create a public network, which generally matched to bridged network. + # Bridged networks make the machine appear as another physical device on + # your network. + jupyter.vm.network "public_network" + jupyter.vm.hostname = "jupyter" + + # Provider-specific configuration so you can fine-tune various + # backing providers for Vagrant. These expose provider-specific options. + # Example for VirtualBox: + # + jupyter.vm.provider "virtualbox" do |vb| + # Customize the amount of memory on the VM: + vb.memory = 8096 + vb.cpus = 4 + end + + jupyter.vm.disk :disk, size: "20GB", name: "home2" + # When shared_data_source is set to true, the disk will be unused + jupyter.vm.disk :disk, size: "50GB", name: "cache" + + # Disable guest updates + jupyter.vbguest.auto_update = false + jupyter.vbguest.no_install = true - config.vm.disk :disk, size: "20GB", name: "home2" - config.vm.disk :disk, size: "50GB", name: "cache" + jupyter.vm.provision "ansible_local" do |ansible| + ansible.playbook = "vagrant-provision.yml" + ansible.become = true + ansible.compatibility_mode = "2.0" + end - config.vm.provision "ansible_local" do |ansible| - ansible.playbook = "vagrant-provision.yml" - ansible.become = true + jupyter.vm.provision "ansible_local" do |ansible| + ansible.playbook = "research-cloud-plugin.yml" + ansible.become = true + ansible.extra_vars = "research-cloud-plugin.vagrant.vars" + ansible.compatibility_mode = "2.0" + end end - config.vm.provision "ansible_local" do |ansible| - ansible.playbook = "research-cloud-plugin.yml" - ansible.become = true - ansible.extra_vars = "research-cloud-plugin.vagrant.vars" + config.vm.define "fileserver", autostart: false do |fileserver| + fileserver.vm.box = "bento/ubuntu-22.04" + fileserver.vm.synced_folder ".", "/vagrant" + + fileserver.vm.network "public_network" + fileserver.vm.hostname = "fileserver" + + fileserver.vm.provider "virtualbox" do |vb| + # Customize the amount of memory on the VM: + vb.memory = 2048 + vb.cpus = 1 + end + + fileserver.vm.disk :disk, size: "500GB", name: "volume_2" + + # Disable guest updates + fileserver.vbguest.auto_update = false + fileserver.vbguest.no_install = true + + fileserver.vm.provision "ansible_local" do |ansible| + ansible.playbook = "vagrant-provision-file-server.yml" + ansible.become = true + ansible.compatibility_mode = "2.0" + end end end diff --git a/create_student_passwords.py b/create_student_passwords.py new file mode 100644 index 00000000..0910ffd5 --- /dev/null +++ b/create_student_passwords.py @@ -0,0 +1,34 @@ +"""Script to generate passwords for users. + +```shell +cat usernames.txt +user1 +user2 +``` + +``` +python3 create_student_passwords.py < usernames.txt > students.txt +``` + +Copy contents of students.txt to the Surf Research Cloud workspace wizard. + +""" + +from secrets import token_urlsafe + +password_length = 32 + +def main(): + usernames = [] + while True: + try: + line = input() + if not line: + break + usernames.append(line.strip()) + except EOFError: + break + userpws = [':'.join([username, token_urlsafe(password_length)]) for username in usernames] + print(','.join(userpws)) + +main() diff --git a/populate-samba.yml b/populate-samba.yml new file mode 100644 index 00000000..6e52609a --- /dev/null +++ b/populate-samba.yml @@ -0,0 +1,69 @@ +- name: Download data from internet to file server + hosts: + - localhost + vars: + # Creds from https://cds.climate.copernicus.eu Climate Data Store + cds_uid: null # Must be filled from other place + cds_api_key: null # Must be filled from other place + # To add more ERA5 variables edit or overwrite roles/prep_shared_data/defaults/main.yml#era5_variables + # To add more apptainer images of hydrological models edit or overwrite roles/prep_shared_data/defaults/main.yml#era5_variables#grpc4bmi_images + # climate_begin_year: 1990 + # climate_end_year: 1990 + # dCache token for uploading files + dcache_rw_token: null # Must be filled from command line + data_root: /data/volume_2/samba-share + + tasks: + # any sram user can become root via passwordless sudo + # TODO exclude students from becoming root + - name: Harden share + ansible.builtin.file: + path: '/data/volume_2/samba-share/' + state: directory + mode: 'u=rwx,g=rx,o=rx' + + - name: Read-only share + ansible.builtin.lineinfile: + path: /etc/samba/smb.conf + line: 'read only = yes' + regexp: '^\s+read only =' + insertafter: '[samba-share]' + notify: Restart samba + + # Apptainer is need to build apptainer images from Docker + - name: Apptainer + ansible.builtin.include_role: + name: apptainer + + - name: Mamba env + ansible.builtin.include_role: + name: mambaorg.micromamba + vars: + root_prefix: /opt/conda + packages: + - era5cli + - esmvalcore + - rclone + - h5netcdf + - hdf5plugin + + - name: Prepare shared data + ansible.builtin.include_role: + name: prep_shared_data + + - name: Download observation data + ansible.builtin.include_role: + name: grdc + + - name: Install rclone + ansible.builtin.include_role: + name: rclone + tasks_from: install + + # TODO upload new data back to dcache using roles/rclone/tasks/upload.yml? + + handlers: + - name: Restart samba + ansible.builtin.service: + name: smbd + state: restarted diff --git a/research-cloud-plugin.yml b/research-cloud-plugin.yml index 9e8fc0e2..d09b67ba 100644 --- a/research-cloud-plugin.yml +++ b/research-cloud-plugin.yml @@ -4,9 +4,7 @@ - all - localhost gather_facts: false - vars: - # dCache token for mounting shared data - dcache_ro_token: null # Must be filled from command line + vars: {} tasks: - name: Wait for system to become reachable wait_for_connection: @@ -15,10 +13,6 @@ - name: Gather facts for first time setup: - - name: Extra vols - debug: - var: external_volumes - - name: Common stuff include_role: name: common @@ -59,38 +53,11 @@ include_role: name: conda - # TODO mount a home and scratch disk, see https://github.com/eWaterCycle/infra/issues/89 - # - name: Scratch disk - # mount: - # path: /scratch - # src: # TODO find correct value, possibly extracted from SRC API or ansible vars/facts - # state: present - # - name: Home disk - # mount: - # path: /home - # src: # TODO find correct value, possibly extracted from SRC API or ansible vars/facts - # state: present - - - name: Mount shared data dcache with rclone - include_role: - name: rclone - tasks_from: mount - # https://lab.ewatercycle.org/ functionality - name: Welcome page include_role: name: labstart - # TODO remove roles instead of commenting them out or make them optional - # # https://explore.ewatercycle.org/ functionality - # - name: Experiment launcher - # include_role: - # name: launcher - - # - name: Explorer - # include_role: - # name: terria - # https://jupyter.ewatercycle.org/ functionality - name: Create eWaterCycle conda env include_role: @@ -100,6 +67,10 @@ include_role: name: jupyter + - name: Set up grader + include_role: + name: grader + - name: Clean apt cache apt: autoclean: true diff --git a/roles/conda/meta/main.yml b/roles/conda/meta/main.yml index e27c8fd1..90a4a530 100644 --- a/roles/conda/meta/main.yml +++ b/roles/conda/meta/main.yml @@ -2,7 +2,7 @@ galaxy_info: author: Stefan Verhoeven description: Conda installation company: Netherlands eScience Center - license: Apache + license: Apache-2.0 min_ansible_version: 2.4 platforms: - name: Ubuntu diff --git a/roles/ewatercycle/README.md b/roles/ewatercycle/README.md index 0ec1b83b..bd290fe3 100644 --- a/roles/ewatercycle/README.md +++ b/roles/ewatercycle/README.md @@ -18,7 +18,7 @@ Role also adds /etc/ewatercycle.yaml and ~/.esmvaltool/config-user.yml config fi Requirements ------------ -This role expects `data_root` to be filled with files prepared by [../../shared-data-disk.yml](../../shared-data-disk.yml) playbook. +This role expects `data_root` to be filled with files prepared by [../../populate-samba.yml](../../populate-samba.yml) playbook. Role Variables -------------- @@ -34,7 +34,7 @@ conda_environment: ewatercycle # Path to conda environments bin directory conda_environment_bin: '{{ conda_root}}/envs/{{ conda_environment }}/bin' # Where all shared data is available -data_root: /mnt/data +data_root: /data/shared # Location of climate data climate_data_root_dir: '{{ data_root }}/climate-data' # Location where GRDC data should be downloaded diff --git a/roles/ewatercycle/defaults/main.yml b/roles/ewatercycle/defaults/main.yml index 433b2ebd..d72a4c42 100644 --- a/roles/ewatercycle/defaults/main.yml +++ b/roles/ewatercycle/defaults/main.yml @@ -6,7 +6,7 @@ conda_environment: ewatercycle2 # Path to conda environments bin directory conda_environment_bin: '{{ conda_root }}/envs/{{ conda_environment }}/bin' # Where all shared data is available -data_root: /mnt/data +data_root: /data/shared # Location of climate data climate_data_root_dir: '{{ data_root }}/climate-data' # Location where GRDC data should be downloaded @@ -19,4 +19,7 @@ apptainer_image_root: '{{ data_root }}/singularity-images' home_root: /home # Version of ewatercycle Python package # /etc/ewatercycle.yaml will be slightly different depending on version. -pyewatercycle_version: 2.3.0 +pyewatercycle_version: v2.3.1 +# From where should /data/shared be mounted, from dcache or samba. +# Determines what parameter sets are avaliable. As samba has a small subset. +shared_data_source: dcache diff --git a/roles/ewatercycle/templates/environment.yml.j2 b/roles/ewatercycle/templates/environment.yml.j2 index d5dad04a..fb4477eb 100644 --- a/roles/ewatercycle/templates/environment.yml.j2 +++ b/roles/ewatercycle/templates/environment.yml.j2 @@ -43,3 +43,7 @@ dependencies: - jupyterlab-git=0.50.1 - nbdime=4.0.1 - nbgitpuller=1.2.1 + - nbgrader=0.9.3 + # Needed for running playbooks, as after initial provision the conda env is the default python + - ansible=10.4.0 + - passlib=1.7.4 diff --git a/roles/ewatercycle/templates/ewatercycle.yaml.j2 b/roles/ewatercycle/templates/ewatercycle.yaml.j2 index 7b7c1973..28bf2953 100644 --- a/roles/ewatercycle/templates/ewatercycle.yaml.j2 +++ b/roles/ewatercycle/templates/ewatercycle.yaml.j2 @@ -1,8 +1,9 @@ grdc_location: {{ grdc_location }} output_dir: . parameter_sets: - # Following parameter sets are coming from dCache. - # The dcache was filled once with by running shared-data-disk.yml playbook. + # Following parameter sets are coming from dCache optionally via a Samba file server. + # The dcache was filled once with by running something like the populate-samba.yml playbook. + # TODO only include parameter sets that are present in parameterset_dir # Example parameter sets lisflood_fraser: @@ -24,6 +25,7 @@ parameter_sets: supported_model_versions: !!set {2020.1.1: null, 2020.1.2: null, 2020.1.3: null} target_model: wflow +{% if shared_data_source == 'dcache' %} # From comparison study wflow_Rhine_ERA5-calibrated: directory: wflow_Rhine_ERA5-calibrated @@ -110,14 +112,8 @@ parameter_sets: directory: sfincs_humber config: sfincs.inp target_model: sfincs +{% endif %} parameterset_dir: {{ parameterset_dir }} -{# ewatercycle<1.4.1 will use singularity instead of apptainer #} -{% set pyewatercycle_pre_apptainer_versions = ('1.4.1', '1.4.0', '1.3.0', '1.2.0', '1.1.4', '1.1.3', '1.1.2', '1.1.1', '1.1.0', '1.0.0') %} -{% if pyewatercycle_version in pyewatercycle_pre_apptainer_versions %} -container_engine: singularity -singularity_dir: {{ apptainer_image_root }} -{% else %} container_engine: apptainer apptainer_dir: {{ apptainer_image_root }} -{% endif %} diff --git a/roles/grader/README.md b/roles/grader/README.md new file mode 100644 index 00000000..913444f7 --- /dev/null +++ b/roles/grader/README.md @@ -0,0 +1,39 @@ +Role Name +========= + +Setup [nbgrader](https://nbgrader.readthedocs.io/). + +Requirements +------------ + +Any pre-requisites that may not be covered by Ansible itself or the role should be mentioned here. For instance, if the role uses the EC2 module, it may be a good idea to mention in this section that the boto package is required. + +Role Variables +-------------- + +Required vars: + +```yaml +grader_user: vagrant +``` + +Optional vars: + +```yaml +students: [["",""]] +``` + +Dependencies +------------ + +Requires `jupyter` role to be run first. + +License +------- + +Apache-2.0 + +Author Information +------------------ + +An optional section for the role authors to include contact information, or a website (HTML is not allowed). diff --git a/roles/grader/defaults/main.yml b/roles/grader/defaults/main.yml new file mode 100644 index 00000000..296529ae --- /dev/null +++ b/roles/grader/defaults/main.yml @@ -0,0 +1,12 @@ +--- +# defaults file for grader +exchange_root: /var/local/nbgrader/exchange +grader_user: vagrant +course_id: course1 +course_repo: https://github.com/eWaterCycle/teaching.git +course_version: nbgrader-quickstart +conda_root: /opt/conda +# Name of conda environment to use +conda_environment: ewatercycle2 +# Path to conda environments bin directory +conda_environment_bin: '{{ conda_root }}/envs/{{ conda_environment }}/bin' diff --git a/roles/grader/handlers/main.yml b/roles/grader/handlers/main.yml new file mode 100644 index 00000000..dec36db7 --- /dev/null +++ b/roles/grader/handlers/main.yml @@ -0,0 +1,2 @@ +--- +# handlers file for grader diff --git a/roles/grader/meta/main.yml b/roles/grader/meta/main.yml new file mode 100644 index 00000000..f058c68c --- /dev/null +++ b/roles/grader/meta/main.yml @@ -0,0 +1,14 @@ +galaxy_info: + author: Stefan Verhoeven + description: nbgrader + company: Netherlands eScience Center + license: Apache-2.0 + + min_ansible_version: '2.4' + platforms: + - name: Ubuntu + versions: + - all + galaxy_tags: [] + +dependencies: [] diff --git a/roles/grader/tasks/main.yml b/roles/grader/tasks/main.yml new file mode 100644 index 00000000..5e289480 --- /dev/null +++ b/roles/grader/tasks/main.yml @@ -0,0 +1,113 @@ +--- +# tasks file for grader +- name: Exchange directory + ansible.builtin.file: + path: '{{ exchange_root }}' + state: directory + mode: '0777' +- name: /etc/jupyter + ansible.builtin.file: + path: /etc/jupyter + state: directory + mode: '0755' +- name: Global nbgrader config + ansible.builtin.copy: + dest: /etc/jupyter/nbgrader_config.py + mode: '0644' + content: | + c = get_config() + c.Exchange.root = '{{ exchange_root }}' +- name: Grader jupyter dir + ansible.builtin.file: + path: /home/{{ grader_user }}/.jupyter + state: directory + owner: '{{ grader_user }}' + group: '{{ grader_user }}' + mode: '0755' +- name: Nbgrader config for grader user + ansible.builtin.copy: + dest: /home/{{ grader_user }}/.jupyter/nbgrader_config.py + mode: '0644' + owner: '{{ grader_user }}' + group: '{{ grader_user }}' + content: | + c = get_config() + c.CourseDirectory.course_id = '{{ course_id }}' + c.CourseDirectory.root = '/home/{{ grader_user }}/{{ course_id }}' +- name: Course directory + ansible.builtin.file: + path: '/home/{{ grader_user }}/{{ course_id }}' + state: directory + owner: '{{ grader_user }}' + group: '{{ grader_user }}' + mode: '0755' +- name: Clone course repo + ansible.builtin.git: + repo: '{{ course_repo }}' + dest: '/home/{{ grader_user }}/{{ course_id }}/source' + version: '{{ course_version }}' + become: true + become_user: "{{ grader_user }}" +- name: Create student posix users + ansible.builtin.user: + name: "{{ item.split(':') | first }}" + password: "{{ item.split(':') | last | password_hash(hashtype='sha512') }}" + shell: /bin/bash + loop: "{{ students.strip().split(',') }}" + when: students is defined and students != '' + +- name: List all non-grader_user non-system users + shell: "set -o pipefail && cat /etc/passwd | grep '/home/' | cut -d: -f1 | grep -v syslog | grep -v {{ grader_user }} || /bin/true" + args: + executable: /bin/bash + register: non_grader_users + changed_when: false + +# TODO when user is added to sram after provisioning then +# add user to nbgrader as student + enable assignment-list extension for that user + +# enable/disable labextensions like https://github.com/jupyter/nbgrader/blob/main/demos/demo_one_class_one_grader/setup_demo.sh +- name: Enable nbgrader extensions for grader_user + shell: | + set -o pipefail + {{ conda_environment_bin }}/jupyter labextension disable --level=user n@jupyter/bgrader:create-assignment + {{ conda_environment_bin }}/jupyter labextension enable --level=user @jupyter/nbgrader:create-assignment + {{ conda_environment_bin }}/jupyter labextension disable --level=user @jupyter/nbgrader:formgrader + {{ conda_environment_bin }}/jupyter labextension enable --level=user @jupyter/nbgrader:formgrader + {{ conda_environment_bin }}/jupyter server extension enable --user nbgrader.server_extensions.formgrader + {{ conda_environment_bin }}/jupyter labextension disable --level=user @jupyter/nbgrader:assignment-list + {{ conda_environment_bin }}/jupyter labextension enable --level=user @jupyter/nbgrader:assignment-list + {{ conda_environment_bin }}/jupyter server extension enable --user nbgrader.server_extensions.assignment_list + {{ conda_environment_bin }}/jupyter labextension disable --level=user @jupyter/nbgrader:course-list + args: + executable: /bin/bash + become: true + become_user: "{{ grader_user }}" + changed_when: true +- name: Enable lab assignment list all non-grader users + loop: "{{ non_grader_users.stdout_lines | default([]) }}" + shell: | + set -o pipefail + {{ conda_environment_bin }}/jupyter labextension disable --level=user @jupyter/nbgrader:assignment-list + {{ conda_environment_bin }}/jupyter labextension enable --level=user @jupyter/nbgrader:assignment-list + {{ conda_environment_bin }}/jupyter server extension enable --user nbgrader.server_extensions.assignment_list + {{ conda_environment_bin }}/jupyter labextension disable --level=user @jupyter/nbgrader:create-assignment + {{ conda_environment_bin }}/jupyter labextension disable --level=user @jupyter/nbgrader:formgrader + {{ conda_environment_bin }}/jupyter labextension disable --level=user @jupyter/nbgrader:course-list + args: + executable: /bin/bash + become: true + become_user: "{{ item }}" + changed_when: true + +- name: Register non_grader_users as nbgrader student + loop: "{{ non_grader_users.stdout_lines | default([]) }}" + command: + cmd: "{{ conda_environment_bin }}/nbgrader db student add {{ item }}" + chdir: "/home/{{ grader_user }}/{{ course_id }}" + environment: + # nbgrader calls subprocess which is not in path by default + PATH: "{{ conda_environment_bin }}:{{ ansible_env.PATH }}" + become: true + become_user: "{{ grader_user }}" + changed_when: true diff --git a/roles/grader/tests/inventory b/roles/grader/tests/inventory new file mode 100644 index 00000000..878877b0 --- /dev/null +++ b/roles/grader/tests/inventory @@ -0,0 +1,2 @@ +localhost + diff --git a/roles/grader/tests/test.yml b/roles/grader/tests/test.yml new file mode 100644 index 00000000..508bd8cc --- /dev/null +++ b/roles/grader/tests/test.yml @@ -0,0 +1,5 @@ +--- +- hosts: localhost + remote_user: root + roles: + - grader diff --git a/roles/grader/vars/main.yml b/roles/grader/vars/main.yml new file mode 100644 index 00000000..f0b5805f --- /dev/null +++ b/roles/grader/vars/main.yml @@ -0,0 +1,2 @@ +--- +# vars file for grader diff --git a/roles/grdc/README.md b/roles/grdc/README.md index b67f9bba..19020d26 100644 --- a/roles/grdc/README.md +++ b/roles/grdc/README.md @@ -2,7 +2,7 @@ Role Name ========= Global Runoff Data Centre, https://www.bafg.de/GRDC/EN/Home/homepage_node.html. -Downloads free datasets from GRDC site and datasets from research drive. +Downloads free datasets from GRDC site. Requirements ------------ @@ -36,14 +36,12 @@ grdc_gtnr: url: https://geoportal.bafg.de/grdc-gtnr?datasource=1&service=SOS&version=2.0&request=GetObservation&featureOfInterest=http://gemstat.bafg.de/stations/1104150,1104530,1159100,1159105,1160235,1160378,1160500,1160580,1160684,1160788,1160880,1255100,1257100,1259100,1445100,1732100,1733600&temporalFilter=phenomenonTime,1961-01-01T00:00:00.000Z/1990-12-31T00:00:00.000Z - range: 1981_2010 url: https://geoportal.bafg.de/grdc-gtnr?datasource=1&service=SOS&version=2.0&request=GetObservation&featureOfInterest=http://gemstat.bafg.de/stations/1104150,1104530,1159100,1159105,1160235,1160378,1160500,1160580,1160684,1160788,1160880,1255100,1257100,1259100,1445100,1732100,1733600&temporalFilter=phenomenonTime,1981-01-01T00:00:00.000Z/2010-12-31T00:00:00.000Z -grdc_researchdrive_archives: - - ``` Dependencies ------------ -Requires rclone and rclone research drive config to be available which is what the researchdrive role does. +None. Example Playbook ---------------- diff --git a/roles/grdc/defaults/main.yml b/roles/grdc/defaults/main.yml index 45e7bf06..18a515b9 100644 --- a/roles/grdc/defaults/main.yml +++ b/roles/grdc/defaults/main.yml @@ -12,18 +12,4 @@ grdc_station_zips: - ftp://ftp.bafg.de/pub/REFERATE/GRDC/catalogue/grdc_stations.zip # Archycos daily zip url grds_archycos_day_zip: ftp://ftp.bafg.de/pub/REFERATE/GRDC/ARC_HYCOS/arc_hycos_day.zip -# Global Terrestrial Network for River Discharge (GTN-R) datasets -# https://www.bafg.de/GRDC/EN/04_spcldtbss/44_GTNR/GTN-R%20SOS2.html -grdc_gtnr: - - id: 1 - name: WMO Region 1 (Africa) - stations: ftp://ftp.bafg.de/pub/REFERATE/GRDC/website/GTNR_ONLINE_1.txt - periods: - - range: 1931-1960 - url: https://gemstat.bafg.de/KiWISGRDC/KiWIS?datasource=1&service=SOS&version=2.0&request=GetObservation&featureOfInterest=http://gemstat.bafg.de/stations/1104150,1104530,1159100,1160235,1160378,1160500,1160580,1160684,1160788,1160880,1255100,1257100,1259100,1445100,1732100,1733600&temporalFilter=phenomenonTime,1931-01-01T00:00:00.000Z/1960-12-31T00:00:00.000Z - - range: 1961-1990 - url: https://gemstat.bafg.de/KiWISGRDC/KiWIS?datasource=1&service=SOS&version=2.0&request=GetObservation&featureOfInterest=http://gemstat.bafg.de/stations/1104150,1104530,1159100,1160235,1160378,1160500,1160580,1160684,1160788,1160880,1255100,1257100,1259100,1445100,1732100,1733600&temporalFilter=phenomenonTime,1961-01-01T00:00:00.000Z/1990-12-31T00:00:00.000Z - - range: 1981-2010 - url: https://gemstat.bafg.de/KiWISGRDC/KiWIS?datasource=1&service=SOS&version=2.0&request=GetObservation&featureOfInterest=http://gemstat.bafg.de/stations/1104150,1104530,1159100,1160235,1160378,1160500,1160580,1160684,1160788,1160880,1255100,1257100,1259100,1445100,1732100,1733600&temporalFilter=phenomenonTime,1981-01-01T00:00:00.000Z/2010-12-31T00:00:00.000Z - # TODO add all regions -grdc_researchdrive_archives: [] +# TODO fetch grdc data from dcache \ No newline at end of file diff --git a/roles/grdc/tasks/gtnr.yml b/roles/grdc/tasks/gtnr.yml deleted file mode 100644 index 22e6cac1..00000000 --- a/roles/grdc/tasks/gtnr.yml +++ /dev/null @@ -1,15 +0,0 @@ ---- -- name: Wich region - debug: - msg: 'Region {{ item.id }}: {{ item.name }}' -- name: GTN-R - Stations - get_url: - url: '{{ item.stations }}' - dest: '{{ grdc_root_dir }}/stations/' -- name: GTN-R - monthlies - get_url: - url: '{{ period.url }}' - dest: '{{ grdc_root_dir }}/monthlies/GTNR_{{ item.id }}.{{ period.range }}.wml2' - loop: '{{ item.periods }}' - loop_control: - loop_var: period diff --git a/roles/grdc/tasks/main.yml b/roles/grdc/tasks/main.yml index 728279c9..d83c4c14 100644 --- a/roles/grdc/tasks/main.yml +++ b/roles/grdc/tasks/main.yml @@ -37,6 +37,3 @@ loop: '{{ grdc_station_zips }}' - name: archycos include_tasks: archycos.yml -- name: researchdrive archives - include_tasks: researchdrive.yml - loop: '{{ grdc_researchdrive_archives }}' diff --git a/roles/grdc/tasks/researchdrive.yml b/roles/grdc/tasks/researchdrive.yml deleted file mode 100644 index cdd58af6..00000000 --- a/roles/grdc/tasks/researchdrive.yml +++ /dev/null @@ -1,12 +0,0 @@ ---- -- name: research drive archive - copy - command: rclone --config /home/ubuntu/.config/rclone/rclone.conf copy RD:/eWaterCycle/GRDC/{{ item }} {{ grdc_root_dir }}/archives/ - args: - chdir: '{{ grdc_root_dir }}/archives' - creates: '{{ grdc_root_dir }}/archives/{{ item }}' -- name: research drive archive - dailies - unarchive: - src: '{{ grdc_root_dir }}/archives/{{ item }}' - dest: '{{ grdc_root_dir }}/dailies' - exclude: '*_Q_Month.txt' - remote_src: yes diff --git a/roles/jupyter/tasks/main.yml b/roles/jupyter/tasks/main.yml index fa6ae2b3..ca190731 100644 --- a/roles/jupyter/tasks/main.yml +++ b/roles/jupyter/tasks/main.yml @@ -5,6 +5,18 @@ port: '8081' proto: tcp from_ip: 172.17.0.0/12 +- name: Create /etc/jupyterhub directory + file: + path: /etc/jupyterhub + state: directory + mode: '0755' +- name: jupyterhub_config + template: + src: jupyterhub_config.py.j2 + dest: /etc/jupyterhub/jupyterhub_config.py + mode: '0644' + notify: + - restart jupyterhub - name: JupyterHub systemd file template: src: jupyterhub.systemd.j2 @@ -34,18 +46,7 @@ mode: 0775 group: jupyter when: jupyterhub_spawner_environment.USGS_DATA_HOME != '/tmp' -- name: Create /etc/jupyterhub directory - file: - path: /etc/jupyterhub - state: directory - mode: '0755' -- name: jupyterhub_config - template: - src: jupyterhub_config.py.j2 - dest: /etc/jupyterhub/jupyterhub_config.py - mode: '0644' - notify: - - restart jupyterhub + - name: Remove default conda python kernel command: '{{ conda_environment_bin }}/bin/jupyter kernelspec remove -f conda' args: @@ -65,6 +66,8 @@ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection $connection_upgrade; + proxy_hide_header Content-Security-Policy; + proxy_set_header Content-Security-Policy "frame-ancestors 'self'"; client_max_body_size 10G; } notify: diff --git a/roles/jupyter/templates/jupyterhub_config.py.j2 b/roles/jupyter/templates/jupyterhub_config.py.j2 index f9d9029b..af80120a 100644 --- a/roles/jupyter/templates/jupyterhub_config.py.j2 +++ b/roles/jupyter/templates/jupyterhub_config.py.j2 @@ -4,11 +4,11 @@ import sys c.JupyterHub.admin_access = True c.JupyterHub.services = [ - { - 'name': 'experiment-launcher', - 'admin': True, - 'api_token': '{{ lookup('password', 'jupyterhub.launcher.token length=32 chars=ascii_letters,digits') }}', - }, + # { + # 'name': 'experiment-launcher', + # 'admin': True, + # 'api_token': '{{ lookup('password', 'jupyterhub.launcher.token length=32 chars=ascii_letters,digits') }}', + # }, { 'name': 'idle-culler', 'admin': True, diff --git a/roles/labstart/defaults/main.yml b/roles/labstart/defaults/main.yml index 9170348c..ed97d539 100644 --- a/roles/labstart/defaults/main.yml +++ b/roles/labstart/defaults/main.yml @@ -1,7 +1 @@ --- -tool_urls: - # explore: / - jupyter: /jupyter - # analytics: https://analytics.ewatercycle.org - # experiments: https://experiments.ewatercycle.org - # lowresforecast: https://lowres-forecast.ewatercycle.org diff --git a/roles/labstart/templates/index.html.j2 b/roles/labstart/templates/index.html.j2 index ccd3e637..76737466 100644 --- a/roles/labstart/templates/index.html.j2 +++ b/roles/labstart/templates/index.html.j2 @@ -61,24 +61,13 @@
- {%- if 'explore' in tool_urls -%} -
- ... -
-

Explore available Hydrology models and datasets.

-
- -
- {%- endif %}
...

Experiment with models and datasets in a notebook.

@@ -96,36 +85,7 @@ Jupyter notebook
- {%- if 'analytics' in tool_urls -%} -
- ... -
-

Analytics for experiment results.

-

Coming SoonTM.

-
- -
-
- ... -
-

Experiments which are running continously using cylc workflow engine.

-
- -
-
- ... -
-

Visualization of global 5arcmin PCR-GLOBWB model.

-
- -
{%- endif %} - +

Configuration for teachbooks

To open books on this machine, you can add the following configuration in the _config.yml file of your teachbook repository:

diff --git a/roles/prep_shared_data/README.md b/roles/prep_shared_data/README.md index 6e09224d..bedaa587 100644 --- a/roles/prep_shared_data/README.md +++ b/roles/prep_shared_data/README.md @@ -85,7 +85,7 @@ esmvaltool_aux_version: dde5fcc78398ff3208589150b52bf9dd0b3bfb30 # Location where eWatercycle example parameter sets will be downloaded to parameter_set_root_dir: '{{ data_root }}/parameter-sets' # Location where eWatercycle example forcings will be downloaded to -example_forcing_root_dir: '{{ data_root }}/forcing' +forcing_root_dir: '{{ data_root }}/forcing' ``` Dependencies diff --git a/roles/prep_shared_data/defaults/main.yml b/roles/prep_shared_data/defaults/main.yml index 752f38c6..afefe608 100644 --- a/roles/prep_shared_data/defaults/main.yml +++ b/roles/prep_shared_data/defaults/main.yml @@ -1,6 +1,6 @@ --- # Where all data should be put -data_root: /mnt/data +data_root: /data/volume_2/samba-share/ # Directory where Apptainer sif files can be stored and read by other apptainer_image_root: '{{ data_root }}/singularity-images' # Docker container images that will be downloaded and converted to apptainer image files @@ -21,39 +21,41 @@ grpc4bmi_images: apptainer: ewatercycle-lisflood-grpc4bmi_20.10.sif - docker: ewatercycle/hype-grpc4bmi:feb2021 apptainer: ewatercycle-hype-grpc4bmi_feb2021.sif + - docker: ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1 + apptainer: ewatercycle-leakybucket-grpc4bmi_v0.0.1.sif + - docker: ghcr.io/daafip/hbv-bmi-grpc4bmi:v1.5.0 + apptainer: daafip-hbv-bmi-grpc4bmi_v1.5.0.sif # Location where conda is installed conda_root: /opt/conda -# Name of conda environment to use -conda_environment: ewatercycle # Path to conda environments bin directory -conda_environment_bin: '{{ conda_root}}/envs/{{ conda_environment }}/bin' +conda_environment_bin: '{{ conda_root}}//bin' # Start year for which climate data should be prepared climate_begin_year: 1990 # End year for which climate data should be prepared -climate_end_year: 1990 -# Which ERA5 variables to download +climate_end_year: 2018 +# Which ERA5 variables to download, key:value as era5_name: [cmor_name, mip] era5_variables: - - total_precipitation - - mean_sea_level_pressure - - 2m_temperature - - minimum_2m_temperature_since_previous_post_processing - - maximum_2m_temperature_since_previous_post_processing - - 2m_dewpoint_temperature - - 10m_u_component_of_wind - - 10m_v_component_of_wind - - surface_solar_radiation_downwards - - toa_incident_solar_radiation - - orography + total_precipitation: [pr,1hr] + mean_sea_level_pressure: [psl,1hrPt] + 2m_temperature: [tas,1hrPt] + 2m_dewpoint_temperature: [tdps,1hrPt] + surface_solar_radiation_downwards: [rsds,1hrPt] + toa_incident_solar_radiation: [rsdt,1hrPt] + # Only needed for Hype and lisflood? + # minimum_2m_temperature_since_previous_post_processing: [tasmin,1hrPt] + # maximum_2m_temperature_since_previous_post_processing: [tasmax,1hrPt] + # 10m_u_component_of_wind: [uas,1hrPt] + # 10m_v_component_of_wind: [vas,1hrPt] # Creds from https://cds.climate.copernicus.eu Climate Data Store to download ERA5 data cds_uid: null # Must be filled from other place cds_api_key: null # Must be filled from other place # Location where climate data should be placed climate_data_root_dir: '{{ data_root }}/climate-data' -# Location where esmvaltool aux data should be placed -auxiliary_data_dir: '{{ climate_data_root_dir }}/aux' -# Version of https://github.com/eWaterCycle/recipes_auxiliary_datasets to checkout to {{ auxiliary_data_dir }} -esmvaltool_aux_version: 279e39b3815d9779e13d93376b42732495ef6330 +# Location where auxiliary +auxiliary_data_dir: '{{ data_root }}/aux' +# Which version of https://github.com/eWaterCycle/recipes_auxiliary_datasets to use +esmvaltool_aux_version: 0.1.0 # Location where eWatercycle example parameter sets will be downloaded to parameter_set_root_dir: '{{ data_root }}/parameter-sets' # Location where eWatercycle example forcings will be downloaded to -example_forcing_root_dir: '{{ data_root }}/forcing' +forcing_root_dir: '{{ data_root }}/forcing' diff --git a/roles/prep_shared_data/files/era5landprocess.py b/roles/prep_shared_data/files/era5landprocess.py new file mode 100644 index 00000000..673f377e --- /dev/null +++ b/roles/prep_shared_data/files/era5landprocess.py @@ -0,0 +1,32 @@ +import argparse +from esmvalcore.dataset import Dataset +from esmvalcore.config import CFG + +from iris import save + +parser = argparse.ArgumentParser(description='Process ERA5 data') +parser.add_argument('output', type=str, help='Output file') + +args = parser.parse_args() + +print(f'Processing ERA5 land data to {args.output}') + +CFG['rootpath'] = { + 'default': '.' +} + +ds = Dataset( + project='native6', + dataset='ERA5', + type='reanaly', + version='v1', + tier=3, + short_name='orog', + mip='fx', + era5_name='z', +) +ds.find_files() +cube = ds.load() + +# Save file +save(cube, args.output, zlib=True) diff --git a/roles/prep_shared_data/files/era5process.py b/roles/prep_shared_data/files/era5process.py new file mode 100644 index 00000000..d7ac425c --- /dev/null +++ b/roles/prep_shared_data/files/era5process.py @@ -0,0 +1,127 @@ +"""Convert raw ERA5 data to esmvaltool compatible format. + +Usage: + + python roles/prep_shared_data/files/era5process.py 1990 total_precipitation pr obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_day_pr_1990-1990.nc +""" + +import argparse +from pathlib import Path +from esmvalcore.dataset import Dataset +from esmvalcore.preprocessor import extract_time, daily_statistics +import xarray as xr +import numpy as np + +parser = argparse.ArgumentParser(description='Process ERA5 data') +parser.add_argument('year', type=int, help='Current year') +parser.add_argument('era5_name', type=str, help='ERA5 name of the variable') +parser.add_argument('short_name', type=str, help='Short name of the variable') +parser.add_argument('output', type=str, help='Output file') + +args = parser.parse_args() + +print(f'Processing ERA5 {args.era5_name} data from year {args.year} to {args.short_name} and saving to {args.output}') + +# When raw files are in . then uncomment below +# CFG['rootpath'] = { +# 'default': '.', +# } + +current_year = args.year +var = args.short_name + +eds = Dataset( + project='native6', + dataset='ERA5', + type='reanaly', + version='v1', + tier=3, + timerange=f'{current_year}/{current_year + 1}', + short_name=var, + mip='E1hr', + era5_name=args.era5_name, + era5_freq='hourly', +) +eds.find_files() +orig_file = eds.files[0] + +def unit_and_encoding(fn: str) -> tuple[str, dict]: + with xr.open_dataarray(fn) as da: + return da.attrs['units'], da.encoding + + +orig_unit, orig_encoding = unit_and_encoding(orig_file) + +cube = eds.load() + +print(f'Loaded data from {eds.files}') + +# Do + first day of next year +cube = extract_time(cube, start_year=current_year, start_month=1, start_day=1, end_year=current_year + 1, end_month=1, end_day=1) +print('Extracted time') +# Do mean or max or min +operator = 'mean' +if var.endswith('max'): + operator = 'max' +if var.endswith('min'): + operator = 'min' +cube = daily_statistics(cube, operator='mean') + +print(f'Computed daily {operator} statistics') + +da = xr.DataArray.from_iris(cube) + +print('Converted to xarray') + +new_unit = da.attrs['units'] + +def compute_encoding(da: xr.DataArray) -> dict: + vmin = da.min().values.tolist() + vmax = da.max().values.tolist() + offset = (vmax - vmin)/2 + vmin + scale_factor = np.max([vmin - offset, offset - vmin])/2**15 * 1.01 # room for nans + return { + 'dtype': 'int16', + '_FillValue': np.iinfo(np.int16).min, + "missing_value": np.iinfo(np.int16).min, + 'scale_factor': scale_factor, + 'add_offset': offset, + } + +if new_unit == orig_unit: + da.encoding = orig_encoding +else: + da.encoding = compute_encoding(da) + +# what: Compression vs no compression zlib l4 +# write: 14 minutes vs 1 minute +# size: 350Mb vs 760Mb +# See https://docs.xarray.dev/en/stable/user-guide/io.html#chunk-based-compression +da.encoding['zlib'] = True +da.encoding['complevel'] = 4 # 4 is the default in netcdf4 library +da.encoding['shuffle'] = True +# Chunk with whole year of a spatial areas as we are going to generate forcing for a certain area and whole year +# da.encoding['chunksizes'] = (365, 80, 160) # chunk size ~18Mb, nr 90 + +# TODO try blosc_lz4 and sz3 from h5plugin, but should be esmvaltool compatible +# da.encoding['zlib'] = False +# da.encoding['blosc_lz4'] = True +# da.encoding['complevel'] = 5 +# da.encoding['shuffle'] = True +# Same size time as uncompressed + +# da.encoding['zlib'] = False +# da.encoding['blosc_lz4'] = False +# da.encoding['sz3'] = True +# da.encoding['complevel'] = 5 +# da.encoding['shuffle'] = True +# Gives TypeError: Compression method must be specified + +print(f'Saving file to {args.output}') +# Save file +Path(args.output).parent.mkdir(parents=True, exist_ok=True) +da.to_netcdf(args.output, format='NETCDF4', engine='netcdf4') +# For fancy compression use h5 engine +# da.to_netcdf(args.output, format='NETCDF4', engine='h5netcdf') + +# TODO check output can be read by esmvaltool diff --git a/roles/prep_shared_data/tasks/apptainer-images.yml b/roles/prep_shared_data/tasks/apptainer-images.yml index 98795c1f..217fc59e 100644 --- a/roles/prep_shared_data/tasks/apptainer-images.yml +++ b/roles/prep_shared_data/tasks/apptainer-images.yml @@ -4,8 +4,15 @@ path: '{{ apptainer_image_root }}' state: directory mode: 0755 +- name: Big temp dir + file: + path: /data/volume_2/tmp + state: directory + mode: 0755 - name: grpc4bmi Apptainer images command: apptainer build {{ apptainer_image_root }}/{{ item.apptainer }} docker://{{ item.docker }} + environment: + TMPDIR: /data/volume_2/tmp args: creates: '{{ apptainer_image_root }}/{{ item.apptainer }}' loop: '{{ grpc4bmi_images }}' diff --git a/roles/prep_shared_data/tasks/climate-data.yml b/roles/prep_shared_data/tasks/climate-data.yml index 9c1c7ca7..0bf5bcea 100644 --- a/roles/prep_shared_data/tasks/climate-data.yml +++ b/roles/prep_shared_data/tasks/climate-data.yml @@ -1,26 +1,82 @@ --- -- name: Copernicus Climate Data Service token - template: - src: csdapirc.j2 - dest: ~/.csdapirc - mode: '0440' -- name: climate data dir +- name: Climate data dir file: path: '{{ climate_data_root_dir }}' state: directory mode: '0755' -- name: Download ERA5 climate data - # TODO only run when needed - command: '{{ conda_environment_bin }}/era5cli hourly --startyear {{ climate_begin_year }} --endyear {{ climate_end_year }} --variables {{ item }}' # noqa no-changed-when - args: - chdir: '{{ climate_data_root_dir }}' - loop: '{{ era5_variables }}' +- name: ERA5 cmorized dir + file: + path: '{{ climate_data_root_dir }}/obs6/Tier3/ERA5' + state: directory + mode: '0755' +- name: ERA5 process script + copy: + src: era5process.py + dest: /tmp/era5process.py + +- name: ESMValTool configuration directory + file: + path: /root/.esmvaltool + state: directory + mode: '0755' +- name: ESMValTool config-user.yml + template: + src: ../../ewatercycle/templates/config-user.yml.j2 + dest: /root/.esmvaltool/config-user.yml + mode: '0644' + +- name: ERA5 download + when: cds_uid is defined and cds_api_key is defined and cds_uid != None and cds_api_key != None + block: + - name: Configure era5cli + command: '{{ conda_environment_bin }}/era5cli config --uid {{ cds_uid }} --key {{ cds_api_key }}' + args: + creates: '~/.config/era5cli/cds_key.txt' + # We need to download era5 hourly data with era5cli + # and then convert it to CMOR format with python script which uses ESMValcore, the script needs the current and next year + # To make efficient use of available disk space we will chunk by variable and year,+year+1 -- name: Prep ERA5 climate data for ESMValTool processing - # TODO only run when needed - command: '{{ conda_environment_bin }}/esmvaltool run cmorizers/recipe_era5.yml' # noqa no-changed-when + - name: ERA5 download + hourly2daily + cmorize, loop variable + loop: '{{ era5_variables | dict2items }}' + loop_control: + loop_var: var_item + include_tasks: era5-variable.yml -# TODO download ERA-Interim + - name: Orography file check + stat: + path: '{{ climate_data_root_dir }}/obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_fx_orog.nc' + register: orog_file + - name: Orography + when: not orog_file.stat.exists + block: + - name: ERA5 land process script + copy: + src: era5landprocess.py + dest: /tmp/era5landprocess.py + - name: Download dir + file: + path: '{{ climate_data_root_dir }}/Tier3/ERA5/v1/fx/orog' + state: directory + mode: '0755' + - name: Download + command: '{{ conda_environment_bin }}/era5cli hourly --startyear {{ climate_begin_year }} --months 1 --days 1 --hours 0 --levels surface --variables geopotential' + args: + chdir: '{{ climate_data_root_dir }}/Tier3/ERA5/v1/fx/orog' + creates: 'era5_geopotential_{{ climate_begin_year }}_hourly.nc' + - name: Cmorize + command: + argv: + - '{{ conda_environment_bin }}/python' + - /tmp/era5landprocess.py + - '{{ climate_data_root_dir }}/obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_fx_orog.nc' + - name: Remove script + file: + state: absent + path: /tmp/era5landprocess.py + - name: Raw data + file: + state: absent + path: '{{ climate_data_root_dir }}/Tier3/ERA5/v1/fx/orog/era5_geopotential_{{ climate_begin_year }}_hourly.nc' -# TODO cmorize ERA-Interim + # Instead of downloading from source we could download from already processed data from dcache diff --git a/roles/prep_shared_data/tasks/era5-variable.yml b/roles/prep_shared_data/tasks/era5-variable.yml new file mode 100644 index 00000000..86d51eef --- /dev/null +++ b/roles/prep_shared_data/tasks/era5-variable.yml @@ -0,0 +1,8 @@ +--- +- name: Loop year + loop: '{{ range(climate_begin_year, climate_end_year + 1) | list }}' + include_tasks: era5-year.yml +# - name: Remove hourly data of last year + 1 +# file: +# state: absent +# path: '{{ climate_data_root_dir }}/Tier3/ERA5/v1/{{ var_item.value[1] }}/{{ var_item.value[0] }}/era5_{{ var_item.key }}_{{ climate_end_year + 1 }}_hourly.nc' diff --git a/roles/prep_shared_data/tasks/era5-year.yml b/roles/prep_shared_data/tasks/era5-year.yml new file mode 100644 index 00000000..734fb330 --- /dev/null +++ b/roles/prep_shared_data/tasks/era5-year.yml @@ -0,0 +1,38 @@ +--- +- name: Output file check + stat: + path: '{{ climate_data_root_dir }}/obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_day_{{ var_item.value[0] }}_{{ item }}-{{ item }}.nc' + register: output_file +- name: Generate output file + when: not output_file.stat.exists + block: + - name: 'Hourly download dir of {{ var_item.key }}: {{ var_item.value }} for year {{ item }}' + file: + path: '{{ climate_data_root_dir }}/Tier3/ERA5/v1/{{ var_item.value[1] }}/{{ var_item.value[0] }}' + state: directory + mode: '0755' + - name: Download current year + command: '{{ conda_environment_bin }}/era5cli hourly --startyear {{ item }} --endyear {{ item }} --variables {{ var_item.key }}' + args: + chdir: '{{ climate_data_root_dir }}/Tier3/ERA5/v1/{{ var_item.value[1] }}/{{ var_item.value[0] }}' + # era5_total_precipitation_1990_hourly.nc + creates: 'era5_{{ var_item.key }}_{{ item }}_hourly.nc' + - name: Download current year + 1 + command: '{{ conda_environment_bin }}/era5cli hourly --startyear {{ item + 1 }} --endyear {{ item + 1 }} --variables {{ var_item.key }}' + args: + chdir: '{{ climate_data_root_dir }}/Tier3/ERA5/v1/{{ var_item.value[1] }}/{{ var_item.value[0] }}' + creates: 'era5_{{ var_item.key }}_{{ item + 1 }}_hourly.nc' + - name: 'Convert to daily + cmorize of {{ var_item.key }}: {{ var_item.value }} for year {{ item }}' + command: + argv: + - '{{ conda_environment_bin }}/python' + - /tmp/era5process.py + - '{{ item }}' + - '{{ var_item.key }}' + - '{{ var_item.value[0] }}' + - '{{ climate_data_root_dir }}/obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_day_{{ var_item.value[0] }}_{{ item }}-{{ item }}.nc' + creates: '{{ climate_data_root_dir }}/obs6/Tier3/ERA5/OBS6_ERA5_reanaly_1_day_{{ var_item.value[0] }}_{{ item }}-{{ item }}.nc' + # - name: Remove hourly data of current year + # file: + # state: absent + # path: '{{ climate_data_root_dir }}/Tier3/ERA5/v1/{{ var_item.value[1] }}/{{ var_item.value[0] }}/era5_{{ var_item.key }}_{{ item }}_hourly.nc' \ No newline at end of file diff --git a/roles/prep_shared_data/tasks/example-forcing.yml b/roles/prep_shared_data/tasks/example-forcing.yml index f2c6c745..824752d3 100644 --- a/roles/prep_shared_data/tasks/example-forcing.yml +++ b/roles/prep_shared_data/tasks/example-forcing.yml @@ -1,11 +1,17 @@ --- +- name: Forcing root + file: + path: '{{ forcing_root_dir }}' + state: directory + mode: '0755' - name: Example forcing root for MARRMoT file: - path: '{{ example_forcing_root_dir }}/MARRMoT/BuffaloRiver' + path: '{{ forcing_root_dir }}/MARRMoT/marrmot-m01_example_1989-1992_buffalo-river' state: directory mode: '0755' - name: MARRMoT example get_url: - url: https://github.com/wknoben/MARRMoT/raw/master/BMI/Config/BMI_testcase_m01_BuffaloRiver_TN_USA.mat - dest: '{{ example_forcing_root_dir }}/marrmot-m01_example_1989-1992_buffalo-river/BMI_testcase_m01_BuffaloRiver_TN_USA.mat' + url: https://github.com/wknoben/MARRMoT/raw/refs/heads/master/BMI/Config/BMI_testcase_m01_BuffaloRiver_TN_USA.mat + dest: '{{ forcing_root_dir }}/MARRMoT/marrmot-m01_example_1989-1992_buffalo-river/BMI_testcase_m01_BuffaloRiver_TN_USA.mat' mode: '0444' +# TODO prepare forcing files using CaravanForcing diff --git a/roles/prep_shared_data/tasks/example-parameter-sets.yml b/roles/prep_shared_data/tasks/example-parameter-sets.yml index 164072f1..26970bc6 100644 --- a/roles/prep_shared_data/tasks/example-parameter-sets.yml +++ b/roles/prep_shared_data/tasks/example-parameter-sets.yml @@ -4,13 +4,26 @@ path: '{{ parameter_set_root_dir }}' state: directory mode: '0755' +- name: ewatecycle package + ansible.builtin.pip: + name: + - ewatercycle + - ewatercycle-pcrglobwb + - ewatercycle-wflow + - ewatercycle-lisflood + state: present + executable: '{{ conda_root }}/bin/pip3' - name: copy fetch.py template: src: fetch.py.j2 dest: '{{ parameter_set_root_dir }}/fetch.py' mode: '0644' +- name: ewatercycle.yaml parent dir + file: + state: directory + path: /root/.config/ewatercycle - name: Download example parameter sets - command: '{{ conda_root }}/envs/ewatercycle/bin/python fetch.py' + command: '{{ conda_root }}/bin/python {{ parameter_set_root_dir }}/fetch.py' args: chdir: '{{ parameter_set_root_dir }}' creates: '{{ parameter_set_root_dir }}/pcrglobwb_rhinemeuse_30min' diff --git a/roles/prep_shared_data/tasks/main.yml b/roles/prep_shared_data/tasks/main.yml index 7e2d4252..db2ea6a4 100644 --- a/roles/prep_shared_data/tasks/main.yml +++ b/roles/prep_shared_data/tasks/main.yml @@ -1,6 +1,4 @@ --- -- name: Download climate data - ansible.builtin.include_tasks: climate-data.yml - name: ESMValTool aux data ansible.builtin.include_tasks: esmvaltool-aux-data.yml - name: Build apptainer image files (sif) for each model @@ -9,3 +7,6 @@ ansible.builtin.include_tasks: example-parameter-sets.yml - name: Download example forcing ansible.builtin.include_tasks: example-forcing.yml +# TODO Make smaller selection +- name: Download climate data + ansible.builtin.include_tasks: climate-data.yml diff --git a/roles/prep_shared_data/templates/csdapirc.j2 b/roles/prep_shared_data/templates/csdapirc.j2 deleted file mode 100644 index 8d79155e..00000000 --- a/roles/prep_shared_data/templates/csdapirc.j2 +++ /dev/null @@ -1,2 +0,0 @@ -url: https://cds.climate.copernicus.eu/api/v2 -key: {{ cds_uid }}: {{ cds_api_key }} diff --git a/roles/prep_shared_data/templates/fetch.py.j2 b/roles/prep_shared_data/templates/fetch.py.j2 index f8f45346..dedf2cc2 100644 --- a/roles/prep_shared_data/templates/fetch.py.j2 +++ b/roles/prep_shared_data/templates/fetch.py.j2 @@ -1,4 +1,2 @@ -import ewatercycle import ewatercycle.parameter_sets -ewatercycle.CFG['parameterset_dir'] = '{{ parameter_set_root_dir }}' ewatercycle.parameter_sets.download_example_parameter_sets() diff --git a/roles/rclone/defaults/main.yml b/roles/rclone/defaults/main.yml index 162415eb..8229947d 100644 --- a/roles/rclone/defaults/main.yml +++ b/roles/rclone/defaults/main.yml @@ -1,6 +1,6 @@ --- # Location where data should be uploaded from or where dCache should be mounted at -data_root: /mnt/data +data_root: /data/shared dcache_url: https://webdav.grid.surfsara.nl:2880 dcache_root: '/' dcache_rclone_name: dcache diff --git a/roles/rclone/tasks/install.yml b/roles/rclone/tasks/install.yml index 25a1c187..76e42574 100644 --- a/roles/rclone/tasks/install.yml +++ b/roles/rclone/tasks/install.yml @@ -7,6 +7,7 @@ - fuse3 state: present - name: Install rclone + # TODO pin version of rclone shell: | set -o pipefail && curl https://rclone.org/install.sh | bash args: diff --git a/roles/rclone/tasks/upload.yml b/roles/rclone/tasks/upload.yml index 32c7e5e9..9fa0b6b8 100644 --- a/roles/rclone/tasks/upload.yml +++ b/roles/rclone/tasks/upload.yml @@ -20,4 +20,4 @@ bearer_token = {{ dcache_rw_token }} - name: Upload data root to dcache # TODO only run when needed - command: rclone copy {{ data_root }} {{ dcache_rclone_name }}:{{ dcache_root }} # noqa no-changed-when + command: rclone copy /mnt/data {{ dcache_rclone_name }}:{{ dcache_root }} # noqa no-changed-when diff --git a/roles/storage/defaults/main.yml b/roles/storage/defaults/main.yml index 0c777990..0b531ba3 100644 --- a/roles/storage/defaults/main.yml +++ b/roles/storage/defaults/main.yml @@ -1,7 +1,12 @@ --- +# From where should /data/shared be mounted, from dcache or samba. +shared_data_source: dcache # defaults file for storage # Mount point for home directories home_location: /home # Directory where /home (`home_location`) should point to # It is directory where extra disk is mounted at and where existing home dirs should be copied to alt_home_location: /data/volume_3 +samba_client_folder: shared +samba_share: samba-share +smbuser: smbuser diff --git a/roles/storage/tasks/data-dcache.yml b/roles/storage/tasks/data-dcache.yml new file mode 100644 index 00000000..c32f84e2 --- /dev/null +++ b/roles/storage/tasks/data-dcache.yml @@ -0,0 +1,5 @@ +--- +- name: Mount shared data dcache with rclone + include_role: + name: rclone + tasks_from: mount diff --git a/roles/storage/tasks/data-samba.yml b/roles/storage/tasks/data-samba.yml new file mode 100644 index 00000000..b8747fc0 --- /dev/null +++ b/roles/storage/tasks/data-samba.yml @@ -0,0 +1,38 @@ +# Copied from https://gitlab.com/rsc-surf-nl/plugins/plugin-samba/-/blob/main/samba-client-linux.yml +- name: Install cifs utils package + package: + name: cifs-utils + +- name: Install nmap utils package + package: + name: nmap + +- name: nmap for the samba IP + shell: nmap -p 445 -T4 -v 10.10.10.0/24 | awk -F'[ /]' '/Discovered open port/{print $NF}' + register: nmap_run + +- name: set variable + set_fact: + private_smb_server_ip: "{{ nmap_run.stdout.split('\n')}}" + when: private_smb_server_ip is not defined + +- name: configure the client + when: private_smb_server_ip[0] | length > 0 + block: + - name: Create directory for samba share + file: + path: /data/{{ samba_client_folder }} + state: directory + + - name: Add samba share to fstab + lineinfile: + path: /etc/fstab + line: > + //{{ private_smb_server_ip[0] }}/{{ samba_share }} + /data/{{ samba_client_folder }} + cifs username={{ smbuser }},password={{ samba_password }},ro,cache=loose + + - name: Mount samba-share + shell: mount -a + ignore_errors: true + diff --git a/roles/storage/tasks/main.yml b/roles/storage/tasks/main.yml index f765d785..205a0555 100644 --- a/roles/storage/tasks/main.yml +++ b/roles/storage/tasks/main.yml @@ -2,3 +2,12 @@ # tasks file for storage - name: Move /home from root disk to other disk include_tasks: home.yml +- name: Source of shared data is + debug: + var: shared_data_source +- name: Mount data from Samba share + include_tasks: data-samba.yml + when: shared_data_source == 'samba' and samba_password is defined +- name: Mount data from dcache + include_tasks: data-dcache.yml + when: shared_data_source == 'dcache' diff --git a/shared-data-disk.yml b/shared-data-disk.yml deleted file mode 100644 index a169b56d..00000000 --- a/shared-data-disk.yml +++ /dev/null @@ -1,39 +0,0 @@ -- name: Download data from internet and upload to dcache - hosts: - - localhost - - all - vars: - # Creds from https://cds.climate.copernicus.eu Climate Data Store - cds_uid: null # Must be filled from other place - cds_api_key: null # Must be filled from other place - # To add more ERA5 variables edit or overwrite roles/prep_shared_data/defaults/main.yml#era5_variables - # To add more apptainer images of hydrological models edit or overwrite roles/prep_shared_data/defaults/main.yml#era5_variables#grpc4bmi_images - climate_begin_year: 1990 - climate_end_year: 1990 - # dCache token for uploading files - dcache_rw_token: null # Must be filled from command line - tasks: - - name: apptainer - include_role: - name: apptainer - - - name: Install conda - include_role: - name: conda - - - name: Create eWaterCycle conda env - include_role: - name: ewatercycle - - - name: Prepare shared data - include_role: - name: prep_shared_data - - - name: Download observation data - include_role: - name: grdc - - - name: Upload to dcache with rclone - include_role: - name: rclone - tasks_from: upload diff --git a/vagrant-provision-file-server.yml b/vagrant-provision-file-server.yml new file mode 100644 index 00000000..33f78678 --- /dev/null +++ b/vagrant-provision-file-server.yml @@ -0,0 +1,92 @@ +- name: Install and configure stuff already available on SURF Research Cloud before the populate-samba.yml playbook is run + hosts: all + gather_facts: false + vars: + samba_password: "samba" + tasks: + - name: Wait for system to become reachable + wait_for_connection: + timeout: 300 + + - name: Gather facts for first time + setup: + + - name: Update APT Cache + apt: + update_cache: yes + + - name: Apt upgrade + apt: + upgrade: yes + + - name: Install galaxy mamba role + command: ansible-galaxy role install mambaorg.micromamba + args: + creates: /root/.ansible/roles/mambaorg.micromamba + + - name: Format data disk + filesystem: + dev: /dev/sdb + fstype: ext4 + + - name: Mount data disk as /data/volume_2 + mount: + path: /data/volume_2 + src: /dev/sdb + fstype: ext4 + state: mounted + + # From https://gitlab.com/rsc-surf-nl/plugins/plugin-samba/-/blob/main/samba-server.yml?ref_type=heads + - name: Create shared directory on external volume at /data/volume_2 + file: + path: /data/volume_2/samba-share + mode: '0777' + state: directory + - name: Install Samba + apt: + name: samba + + - name: Configure Samba server + lineinfile: + path: /etc/samba/smb.conf + line: | + [samba-share] + comment = Samba on Ubuntu + path = /data/volume_2/samba-share + read only = no + browsable = yes + + - name: Restart Samba + systemd: + state: restarted + name: smbd + + - name: Create Samba-user + user: + name: smbuser + groups: sudo + append: yes + + - name: Set Samba-user password + shell: > + (echo '{{ samba_password }}'; echo '{{ samba_password }}') | + passwd smbuser + + - name: Set Samba-user password + shell: > + (echo '{{ samba_password }}'; echo '{{ samba_password }}') | + smbpasswd -s -a smbuser + + - name: Set samba-share owner + shell: chown smbuser:smbuser /data/volume_2/samba-share + + # The nameserver configure by Vagrant could not resolve cds.climate.copernicus.eu so use cloudflare nameserver instead + - name: Configure Cloudflare nameserver + lineinfile: + path: /etc/resolv.conf + line: "nameserver 1.1.1.1" + insertafter: EOF + + - name: Next + debug: + msg: Next is to call the ansible command to provision the eWatercycle file-server. See README.md#file-server diff --git a/vagrant-provision.yml b/vagrant-provision.yml index aca113a5..3d2b0543 100644 --- a/vagrant-provision.yml +++ b/vagrant-provision.yml @@ -79,6 +79,10 @@ fstype: ext4 state: mounted + - name: passlib + apt: + name: python3-passlib + - name: Next debug: msg: Next is to call the ansible command to provision the eWatercycle platform. See README.md#local-test-vm diff --git a/workspace-parameters.png b/workspace-parameters.png new file mode 100644 index 00000000..a406553c Binary files /dev/null and b/workspace-parameters.png differ