Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dcache or samba catalog item #169

Open
wants to merge 134 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
377783d
Setup nbgrader in one class, one grader mode
sverhoeven Feb 13, 2024
605ff3a
Add student creation via interacve src parameter + try e2xgrader
sverhoeven Feb 13, 2024
77a8cd2
Drop e2xgrader is not a lab extension yet, removing it
sverhoeven Feb 13, 2024
214f652
Improve docs
sverhoeven Feb 13, 2024
aa9c7ef
Add sudo
sverhoeven Feb 13, 2024
52368c7
No longer need experimental env var with latest vagrant
sverhoeven Feb 14, 2024
c55b2fe
Fix license
sverhoeven Feb 14, 2024
ab489be
Fix ERROR: Failed to find specified config file: /etc/jupyterhub/jupy…
sverhoeven Feb 14, 2024
648ad57
Add course repo + refactor + fix lint errors
sverhoeven Feb 14, 2024
6d1b87d
Dont use deprecated include
sverhoeven Feb 14, 2024
1195457
nbgrader does subprocess alembic, but was not in path so activate con…
sverhoeven Feb 14, 2024
ee6ab9a
Set PATH env var so alembic can be found
sverhoeven Feb 14, 2024
37e6ab5
Update TEACH.md
sverhoeven Feb 14, 2024
1d35fb2
Test out named volumes + cluster
sverhoeven Feb 28, 2024
5b78a67
Merge branch 'grader' of github.com:eWaterCycle/infra into grader
sverhoeven Feb 28, 2024
178632d
Mirror rs dat more
sverhoeven Feb 28, 2024
eb66ff2
Expose worker_ip_addresses as plugin resource parameter
sverhoeven Feb 28, 2024
326a8a2
unneeded if
sverhoeven Feb 28, 2024
58a91ac
Skip cluster
sverhoeven Feb 28, 2024
01a0a8c
Really skip cluster, left is named storage
sverhoeven Feb 28, 2024
1938143
Expose attached storage items as interactive parameters
sverhoeven Feb 28, 2024
dbec610
re-enable all roles
sverhoeven Feb 28, 2024
dc5df4e
Better docs
sverhoeven Feb 28, 2024
5575997
Merge TEACH.md into other docs
sverhoeven Feb 28, 2024
ce4c108
Pick right branch
sverhoeven Feb 28, 2024
fbe8706
Add all non grader non system users as students + disable some nbgrad…
sverhoeven Feb 28, 2024
ac455f0
Dump worker_ip_addresses
sverhoeven Feb 28, 2024
1e50645
false is not a command
sverhoeven Feb 28, 2024
1534fec
List users in /home
sverhoeven Feb 28, 2024
012f1ac
More todos
sverhoeven Feb 28, 2024
ed39428
Debug students parameter
sverhoeven Mar 6, 2024
3bb530f
Sync README with live src config
sverhoeven Mar 6, 2024
d5a91c2
Disable explore & launcher, not needed for teaching
sverhoeven Mar 6, 2024
3870712
Try to fix list users + use nbgrader demo commands
sverhoeven Mar 6, 2024
29b5ba0
Add screenshot of parameters.
sverhoeven Mar 6, 2024
04c981d
Just show jupyter card on home page
sverhoeven Mar 6, 2024
f588496
For non_grader_users use same command as myusers + disable unwanted e…
sverhoeven Mar 6, 2024
1ce7c2a
Make even more similar
sverhoeven Mar 6, 2024
7c67bb6
Install bleeding edge nbgrader for working validate button
sverhoeven Mar 6, 2024
e8383d8
Fallback to legacy shell module
sverhoeven Mar 6, 2024
d97fa71
Use legacy command and use module parameter
sverhoeven Mar 6, 2024
7d8d938
Debug ansible version
sverhoeven Mar 6, 2024
63b4f2c
Remove debug
sverhoeven Mar 6, 2024
063ef96
The double qoutes in students json string are being swllowed by SRC.
sverhoeven Mar 14, 2024
daa27ca
Dont use json for students list
sverhoeven Mar 14, 2024
97ec9da
Do splits of students in loop and task
sverhoeven Mar 20, 2024
f1c9db1
Drop ==USERNAME== as default in catalog item
sverhoeven Mar 20, 2024
06a5f87
Call split on string
sverhoeven Mar 20, 2024
8d30bbb
Rewrite plugin for src cluster support
sverhoeven Mar 20, 2024
fdbb0f4
Empty string is treated as ['']
sverhoeven Mar 20, 2024
5b484fa
Add nfs server
sverhoeven Mar 20, 2024
151297c
Correct yaml
sverhoeven Mar 20, 2024
5381683
Dont do tasks with worker_ip_addresses when its undefined
sverhoeven Mar 21, 2024
019f029
less warnings
sverhoeven Mar 21, 2024
0ddd8a8
Use is defined
sverhoeven Mar 21, 2024
3fd8dce
Clean apt cache after last install not before
sverhoeven Mar 21, 2024
3ca61cf
only fill known hosts when there are workers
sverhoeven Mar 21, 2024
a9de677
Correct indentation
sverhoeven Mar 21, 2024
f5fac91
Fix some lint warnings
sverhoeven Mar 21, 2024
3d55855
Log versions
sverhoeven Mar 21, 2024
bf84eee
Comment out cluster stuff, too many tries bringing it up
sverhoeven Mar 21, 2024
cae24a3
Bump ewatercycle version to 2.1
BSchilperoort Mar 26, 2024
142dc94
Set Content-Security-Policy "frame-ancestors header
sverhoeven Apr 10, 2024
4e9f6f6
Format
sverhoeven Apr 10, 2024
e1b6403
Get current users with home from /etc/passwd
sverhoeven Apr 10, 2024
2183cf7
Just self
sverhoeven Apr 10, 2024
c43c4b1
Replace rclone mount of dcache with a Samba mount
sverhoeven Apr 10, 2024
9cec3c1
Improve spacing
sverhoeven Apr 10, 2024
b20640d
Creating playbook for filling file server
sverhoeven Apr 18, 2024
31b6553
Make linter happier
sverhoeven Apr 18, 2024
655fbd1
Lower bar for linter in ci
sverhoeven Apr 18, 2024
9a50fcb
Use Ansible/era5cli/esmvalcore to download ERA5
sverhoeven May 6, 2024
ab79fe9
Use scale/offset and zlib to convert era5 raw files to esmvaltool com…
sverhoeven May 14, 2024
6a99e64
Blocks cannot be looped task files can
sverhoeven May 14, 2024
daecac5
Got it running + Temporary disable removal raw files
sverhoeven May 14, 2024
f4e40d5
Multi machine vagrantfile
sverhoeven May 14, 2024
8df80d4
Got era5process.py to run, sadly killed due to out of memory
sverhoeven May 14, 2024
08316f2
Clean up raw era5 files after cmorization
sverhoeven May 15, 2024
79522a5
SRC ships with acient ansible, upgrade it
sverhoeven May 15, 2024
f048264
raw era5 is searched for by esmvaltool not always in .../1hr/...
sverhoeven May 21, 2024
df5f5cf
The grdc gt-nr links no longer work + todo to get grdc data from dcache
sverhoeven Jun 6, 2024
0c17a47
Allow logins on JupyterHub 5.0
sverhoeven Jun 27, 2024
b426db2
Allow logins on JupyterHub 5.0
sverhoeven Jun 27, 2024
3c2d10e
Update main.yml
sverhoeven Jul 8, 2024
8b15bab
Update main.yml
sverhoeven Jul 8, 2024
e18d662
Sync comment with code
sverhoeven Aug 28, 2024
6c48305
Merge remote-tracking branch 'origin/grader-samba' into grader-samba
sverhoeven Aug 28, 2024
4fb1e13
Merge remote-tracking branch 'origin/main' into grader
sverhoeven Sep 5, 2024
8201d46
Merge remote-tracking branch 'origin/grader' into grader-samba
sverhoeven Sep 5, 2024
f0655dc
Remove researchdrive for grdc download, going to use dcache as source…
sverhoeven Sep 5, 2024
6522d7c
Make index.html simpler
sverhoeven Sep 5, 2024
8029fbc
Make index.html simpler
sverhoeven Sep 5, 2024
aa4090e
Use /jupyter/hub from variable
sverhoeven Sep 5, 2024
5d2cc1c
Use right key
sverhoeven Sep 5, 2024
609c3c6
add back vagrant virtualbox wsl2 plugin
sverhoeven Sep 5, 2024
0732e8d
Improve docs
sverhoeven Sep 11, 2024
a2c99d0
Make homepage static
sverhoeven Sep 11, 2024
9af173a
Merge remote-tracking branch 'origin/grader-samba' into dcache-or-samba
sverhoeven Sep 11, 2024
3a2d818
Make shared data source configurable in theory
sverhoeven Sep 11, 2024
a6d269c
Make vagrant commands copy pastable
sverhoeven Sep 12, 2024
46d4edb
Print data_share_source
sverhoeven Sep 12, 2024
845f749
Use same os on fileserver + remove worker_pip_addresses variable
sverhoeven Sep 12, 2024
0a352bb
Use this branch
sverhoeven Sep 12, 2024
0d957c5
Make vagrant file server use less resources
sverhoeven Sep 12, 2024
055c9e6
Make jupyter and fileserver on same network + workaround for ansble v…
sverhoeven Sep 17, 2024
5672771
Tested all prep data steps except climate data download
sverhoeven Sep 17, 2024
2ff26fd
Make samba mount work with vagrant
sverhoeven Sep 17, 2024
a3e85b5
Document vars for vagrant file server + update populate data chapters
sverhoeven Sep 17, 2024
bcb8876
Make readme focus on workspace creation + moved other chapters to own…
sverhoeven Sep 17, 2024
5d3e800
Renest
sverhoeven Sep 17, 2024
4d04d4f
Skip parameter sets when shared data source is samba
sverhoeven Sep 17, 2024
cbbafcc
Pin to latest ewatercycle
sverhoeven Sep 24, 2024
ba002bb
Ignore just space in students
sverhoeven Sep 24, 2024
312e9fd
Created component and dcache catalog item
sverhoeven Sep 24, 2024
96104df
Clone dcache catalog item to samba catalog item
sverhoeven Sep 24, 2024
2d33ca8
Update workspace parameters screenshot
sverhoeven Sep 24, 2024
7f4bdb8
correct version
sverhoeven Sep 25, 2024
a297bd2
Handle ' ' as students value
sverhoeven Sep 25, 2024
04275fc
Mark students as optional
sverhoeven Oct 3, 2024
034b0d8
Dont use ==USERNAME==, does not work for parameters
sverhoeven Oct 3, 2024
481b25d
nbgrader labextensions where moved to @jupyter namespace
sverhoeven Oct 7, 2024
59ecbd3
Dont create students posix accounts when students is undefined
sverhoeven Oct 7, 2024
dea68eb
Make sure passlib is installed for vagrant up
sverhoeven Oct 7, 2024
0700062
Trying to spin up src workspace with samba as data source
sverhoeven Oct 7, 2024
0043b38
Merge remote-tracking branch 'origin/dcache-or-samba' into dcache-or-…
sverhoeven Oct 14, 2024
bf6ed1c
Add AI disclaimer according with https://doi.org/10.5281/zenodo.10363728
sverhoeven Oct 14, 2024
745264e
List paths that ewatecycle package and esmvaltool expect
sverhoeven Oct 14, 2024
c527574
Make ERA5 download optional
sverhoeven Oct 14, 2024
7395fb0
Add grdc dir so `import ewatercycle` does not complain
sverhoeven Oct 14, 2024
91f4cff
Run playbook with screen
sverhoeven Oct 14, 2024
dd374f0
Skip ERA5 download when cds vars are None
sverhoeven Oct 14, 2024
41d3b1b
Add missing var
sverhoeven Oct 14, 2024
341dd3c
No need to mkdir grdc already done in grdc role
sverhoeven Oct 14, 2024
3846143
English
sverhoeven Oct 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ jobs:
- name: Deps
run: pip install -r requirements.txt
- name: Lint Ansible Playbook
run: ansible-lint --profile min --force-color research-cloud-plugin.yml shared-data-disk.yml
run: ansible-lint --profile min --force-color research-cloud-plugin.yml populate-samba.yml
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ env
research-cloud-plugin.vagrant.vars
jupyterhub.launcher.token
launcher.jwt.secret
.venv
121 changes: 121 additions & 0 deletions DATA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Shared data
- [Shared data](#shared-data)
- [Configured paths](#configured-paths)
- [Populating Samba file server](#populating-samba-file-server)
- [Populating dcache](#populating-dcache)
- [Sync dcache with existing folder elsewhere](#sync-dcache-with-existing-folder-elsewhere)
- [Mount dcache on local machine](#mount-dcache-on-local-machine)

This document is dedicated for application data preparer.

## Configured paths

The eWatercycle Python package (`/etc/ewatercycle.yaml`) and ESMValTool (`~/.esmvaltool/config-user.yml`) have been configured to use the following paths:

- Root is `/data/shared`
- Climate data is in `/data/shared/climate-data`, used to generated forcings.
- ESMValTool auxiliary data is in `/data/shared/climate-data/aux`
- OBS6 data is in `/data/shared/climate-data/obs6`
- Parameter sets are in `/data/shared/parameter-sets`, used to run models.
- Apptainer images are in `/data/shared/singularity-images`, used to run containerized models.
- GRDC observations are in `/data/shared/observation/grdc/dailies`, used `ewatercycle.observation.grdc.get_grdc_data()` function.

## Populating Samba file server

Populating the `/data/volume_2/samba-share/` directory on the Samba file server can be done with a Ansible playbook using the following commands.
<!--
this could be run during workspace creation, but downloads are very flaky and time consuming, also this would require maintaining another SRC compoent+catalog item so done manually after workspace is up. -->

```shell
sudo -i
git clone -b dcache-or-samba https://github.com/eWaterCycle/infra.git /opt/infra
cd /opt/infra
ansible-galaxy role install mambaorg.micromamba
# Playbook will run for a long time, so run it in a detachable shell
screen
# Get cds user id (uid) and api key from cds profile page
ansible-playbook populate-samba.yml -e cds_uid=... -e cds_api_key=...
# If you do not want to download ERA5 data then leave out cds_uid and cds_api_key arguments.
# Detach screen with Ctrl+A, D
# Reattach screen with screen -r
```

This will:
0. Harden the share, so only root can write in /data/volume_2/samba-share/ and its readonly
1. Download Apptainer images for models
3. Setup era5cli to download era5 data
5. Download raw era5 data with era5cli
6. Aggregate, cmorize and compress era5 data with custom esmvaltool script
7. Setup rclone for copying data from dcache to file server
8. Create a ewatercycle.yaml which can be used on the Jupyter machines.
1. Create empty directory `/data/shared/observation/grdc/dailies` where GRDC data can be stored.
9. Create a esmvaltool config file which can be used on the Jupyter machines.

If you have data elsewhere you can sync the data with this file server with

```shell
rsync -av --progress <remote user>@<remote host>/<remote location> /data/volume_2/samba-share/
```

## Populating dcache

This chapter is dedicated for application data preparer.

First gather all your data togethe on a server (like snellius or spider).
You can use parts of the [populate-samba.yml](populate-samba.yml) playbook to download data.

Populating the dcache can be done from a server (like snellius or spider)
with the following command

```shell
# cd to directory with data
# have a rclone config with dcache macaroon
rclone copy . dcache:ewatercycle
```

## Sync dcache with existing folder elsewhere

The steps above fetch the data from original sources. If you want to sync some files from
another location, say, Snellius, you can use rclone directly. In our experience, it works
better to sync entire directories than to try and copy single files.

Create the file `~/.config/rclone/rclone.conf` and add the following content:

```
[ dcache ]
type = webdav
url = https://webdav.grid.surfsara.nl:2880
vendor = other
user =
pass =
bearer_token = <dcache macaroon with read/write permissions>
```

You can verify your access by running an innocent `rclone ls dcache:parameter-sets`.
The command to sync directories is `rclone copy somedir dcache:parameter-sets/somedir`.
Beware that this will overwrite any existing files, if different!

Note: password manager can be used for exchanging macaroons.

## Mount dcache on local machine

Create the file `~/.config/rclone/rclone.conf` and add the following content:

```ini
[dcache]
type = webdav
url = https://webdav.grid.surfsara.nl:2880
vendor = other
user =
pass =
bearer_token = <dcache macaroon with read permissions>
```

Install [rclone](https://rclone.org/) and run following command to mount dcache at `~/dcache` directory.

```shell
mkdir ~/dcache
rclone mount --read-only --cache-dir /tmp/rclone-cache --vfs-cache-max-size 30G --vfs-cache-mode full dcache:/ ~/dcache
```

In ESMValTool config files you can use `~/dcache/climate-data/obs6` for `rootpath:OBS6`.
Loading
Loading