Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace rclone mount of dcache with a Samba mount #162

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
c43c4b1
Replace rclone mount of dcache with a Samba mount
sverhoeven Apr 10, 2024
9cec3c1
Improve spacing
sverhoeven Apr 10, 2024
b20640d
Creating playbook for filling file server
sverhoeven Apr 18, 2024
31b6553
Make linter happier
sverhoeven Apr 18, 2024
655fbd1
Lower bar for linter in ci
sverhoeven Apr 18, 2024
9a50fcb
Use Ansible/era5cli/esmvalcore to download ERA5
sverhoeven May 6, 2024
ab79fe9
Use scale/offset and zlib to convert era5 raw files to esmvaltool com…
sverhoeven May 14, 2024
6a99e64
Blocks cannot be looped task files can
sverhoeven May 14, 2024
daecac5
Got it running + Temporary disable removal raw files
sverhoeven May 14, 2024
f4e40d5
Multi machine vagrantfile
sverhoeven May 14, 2024
8df80d4
Got era5process.py to run, sadly killed due to out of memory
sverhoeven May 14, 2024
08316f2
Clean up raw era5 files after cmorization
sverhoeven May 15, 2024
79522a5
SRC ships with acient ansible, upgrade it
sverhoeven May 15, 2024
f048264
raw era5 is searched for by esmvaltool not always in .../1hr/...
sverhoeven May 21, 2024
df5f5cf
The grdc gt-nr links no longer work + todo to get grdc data from dcache
sverhoeven Jun 6, 2024
0c17a47
Allow logins on JupyterHub 5.0
sverhoeven Jun 27, 2024
3c2d10e
Update main.yml
sverhoeven Jul 8, 2024
e18d662
Sync comment with code
sverhoeven Aug 28, 2024
6c48305
Merge remote-tracking branch 'origin/grader-samba' into grader-samba
sverhoeven Aug 28, 2024
8201d46
Merge remote-tracking branch 'origin/grader' into grader-samba
sverhoeven Sep 5, 2024
f0655dc
Remove researchdrive for grdc download, going to use dcache as source…
sverhoeven Sep 5, 2024
6522d7c
Make index.html simpler
sverhoeven Sep 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 89 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The setup instructions in this repo will create an eWaterCycle application(a sor

An application on the SURF Research cloud is provisioned by running an Ansible playbook (research-cloud-plugin.yml).

In addition to the standard VM storage, additional read-only datasets are mounted at `/mnt/data` from dCache using rclone. They may contain things like:
In addition to the standard VM storage, additional read-only datasets are mounted at `/data/shared` from a file server. They may contain things like:

- climate data, see <https://ewatercycle.readthedocs.io/en/latest/system_setup.html#download-climate-data>
- observation
Expand All @@ -45,8 +45,6 @@ Create config file `research-cloud-plugin.vagrant.vars` with

```yaml
---
dcache_ro_token: <dcache macaroon with read permission>
rclone_cache_dir: /data/volume_2
# Directory where /home should point to
alt_home_location: /data/volume_3
# Vagrant user is instructor
Expand Down Expand Up @@ -74,6 +72,23 @@ Go to `http://<ip of eth1>` and login with `vagrant:vagrant`.

You will get some complaints about unsecure serving, this is OK for local testing and this will not happen on Research Cloud.

### Vagrant File server

The file server can also be tested locally with Vagrant using:

```shell
vagrant up fileserver
vagrant ssh fileserver
```

And follow the steps in the [File Server](#file-server) section.

To clean up use

```shell
vagrant destroy fileserver
```

### Test on Windows Subsystem for Linux 2

WSL2 users should follow steps on [https://www.vagrantup.com/docs/other/wsl](https://www.vagrantup.com/docs/other/wsl).
Expand Down Expand Up @@ -103,19 +118,9 @@ For eWatercycle component following specialization was done
- SURF HPC Cloud, with all non-gpu sizes selected
- SURF HPC Cloud cluster, with all non-gpu sizes selected
- Component parameters, all fixed source type, required and overwitable unless otherwise stated
- dcache_ro_token: parameter for dcache read-only token aka macaroon.
The token can be found in the eWaterCycle password manager.
This token has an expiration date, so it needs to be updated every now and then.
- description: Macaroon with read permission for dcache
- alt_home_location:
- default: /data/volume_2
- description: Path where home directories are stored. Set to `/data/<storage item name for homes>`.
- rclone_cache_dir:
- default: /data/volume_3
- description: Path where rclone cache is stored. Set to `/data/<storage item name for rclone cache>`.
- rclone_max_gsize:
- default: 45
- description: For maximum size of cache on `rclone_cache_dir` volume. In Gb.
- grader_user:
- description: User who will be grading. User should be created on sram. This user will also be responsible for setting up the course and assignments.
- default: ==USERNAME==
Expand All @@ -133,8 +138,11 @@ For eWatercycle component following specialization was done
- source type: Resource
- default: worker_ip_addresses
- desciption: Makes addresses of workers available to Ansible playbook. Only used when cloud provider `SURF HPC Cloud cluster` is selected.
- samba_password:
- source_type: Co-Secret
- value: {"key": "samba_password","sensitive": 1}
- Set documentation URL to `https://github.com/eWaterCycle/infra`
- Do not allow every org to use this component. Data on the dcache should not be made public.
- Do not allow every org to use this component.
- Select the organizations (CO) that are allowed to use the component.

For eWatercycle catalog item following specialization was done
Expand All @@ -144,23 +152,17 @@ For eWatercycle catalog item following specialization was done
2. SRC-CO
3. SRC-Nginx
4. SRC-External plugin
5. eWatercycle
5. eWatercycle teaching samba
- Set documentation URL to `https://github.com/eWaterCycle/infra`
- Select the organizations (CO) that are allowed to use the catalog item.
- In cloud provider and settings step:
- Add `SURF HPC Cloud` as cloud provider
- Set Operating Systems to Ubuntu 22.04
- Set Sizes to all non-gpu and non-disabled sizes
- Add `SURF HPC Cloud cluster` as cloud provider
- Set Operating Systems to Ubuntu 22.04
- Set Sizes to all non-gpu and non-disabled sizes
- In parameter settings step keep all values as is except
- Set `co_irods` to `false` as we do not use irods
- Set `co_research_drive` to `false` as we do not use research drive
- As interactive parameters expose following:
- rclone_cache_dir:
- label: Rclone cache directory
- description: Path where rclone cache is stored. Set to `/data/<storage item name for rclone cache>`.
- alt_home_location:
- label: Homes path
- description: Path where home directories are stored. Set to `/data/<storage item name for homes>`.
Expand All @@ -171,10 +173,6 @@ For eWatercycle catalog item following specialization was done
- students
- label: Students
- description: List of student user name and passwords. Format '<username1>:<password1>,<username2>:<password2>'. Use '' for no students. Use secure passwords as anyone on the internet can access the machine.
- num_nodes
- label: Number of nodes
- description: Only used when cloud provider `SURF HPC Cloud cluster` is selected.
- default: 2
- Set boot disk size to 150Gb,
as default size will be mostly used by the conda environment and will trigger out of space warnings.
- Set workspace acces button behavior to `Webinterface (https:)`,
Expand All @@ -187,30 +185,79 @@ See [docs](https://servicedesk.surf.nl/wiki/display/WIKI/Workspace+roles%3A+Appo

This chapter is dedicated for application deployers.

For a new CO make sure

- application is allowed to be used by CO. See [Sharing catalog items](https://servicedesk.surfsara.nl/wiki/display/WIKI/Sharing+catalog+items)

1. Log into Research Cloud
1. Create new storage item for home directories
- To store user files
- Use 50Gb size for simple experiments or bigger when required for experiment.
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
1. Create new storage item for cache
- To store cached files from dCache by rclone
- Use 50GB size as size
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
1. Create new storage item for data
- To store training material like parameter sets, ready-to-use forcings, raw forcings and apptainer sif files for models.
2. Create private network
- Name: `file-storage-network`
3. In Collaborative organizations
- Create a secret named `samba_password` and a strong random password as value

### File Server

Each collaborative organization should run a single file server. This file server will be used to store shared data. The file server should be created with the following steps:

1. Create a new workspace
1. Select eWaterCycle application
1. Select collaborative organisation (CO) for example `ewatercycle-nlesc`
1. Select size of VM (cpus/memory) based on use case
1. Select home storage item and cache storage item. Remember items you picked as you will need them in the workspace parameters.
1. Fill **all** the workspace parameters. They should look something like
![workspace-parameters](workspace-parameters.png)
2. Wait for machine to be running
3. Visit URL/IP
4. When done delete machine
2. Select `Samba Server` application
3. Select size with 2 cores and 16 GB RAM
4. Select data storage item
5. Select private network
6. Wait for machine to be running
7. Login to machine with ssh
1. Become root With sudo
2. Edit /etc/samba/smb.conf and replace `read only = no` with `read only = yes`
3. Restart samba server with `systemctl restart smbd`
8. Populate `/data/volume_2/samba-share/` directory with training material. This directory will be shared with other machines.

Populating can be done with a Ansible playbook (this could be run during workspace creation, but downloads are very flaky and time consuming).

For a new CO make sure
```shell
sudo -i
pip install -U ansible ansible-core
git clone -b grader-samba https://github.com/eWaterCycle/infra.git /opt/infra
cd /opt/infra
ansible-galaxy role install mambaorg.micromamba
# Get cds user id (uid) and api key from cds profile page
ansible-playbook /opt/infra/shared-data-disk.yml -e cds_uid=... -e cds_api_key=...
```

- application is allowed to be used by CO. See [Sharing catalog items](https://servicedesk.surfsara.nl/wiki/display/WIKI/Sharing+catalog+items)
- data storage item and home dir are created for the CO
This will:
0. Harden the share, so only root can write in /data/volume_2/samba-share/ and its readonly
1. Download Apptainer images for models
3. Setup era5cli to download era5 data
5. Download raw era5 data with era5cli
6. Aggregate, cmorize and compress era5 data with custom esmvaltool script
7. Setup rclone for copying data from dcache to file server
8. Create a ewatercycle.yaml which can be used on the Jupyter machines.
9. Create a esmvaltool config file which can be used on the Jupyter machines.

If you have another file server that has data you can sync the data with this file server with

```shell
rsync -av --progress <remote user>@<remote host>/<remote location> /data/volume_2/samba-share/
```

## eWaterCycle machine

1. Create a new workspace
2. Select `eWaterCycle teaching samba` application
3. Select collaborative organisation (CO) for example `ewatercycle-nlesc`
4. Select size of VM (cpus/memory) based on use case
5. Select home storage item. Remember items you picked as you will need them in the workspace parameters.
6. Select private network
7. Fill **all** the workspace parameters. They should look something like
![workspace-parameters](workspace-parameters.png)
8. Wait for machine to be running
9. Visit URL/IP
10. When done delete machine

End user should be invited to CO so they can login.

Expand Down Expand Up @@ -240,7 +287,7 @@ This link uses [nbgitpuller](https://jupyterhub.github.io/nbgitpuller/) to sync
This chapter is dedicated for application data preparer.

The [eWatercycle system setup](https://ewatercycle.readthedocs.io/en/latest/system_setup.html) requires a lot of data files.
For the Research cloud virtual machines we will mount a dcache bucket.
For the Research cloud virtual machines we will copy data from a dcache bucket to a Samba file server also running on Research Cloud.

To fill the dcache bucket you can run

Expand Down
85 changes: 55 additions & 30 deletions Vagrantfile
Original file line number Diff line number Diff line change
@@ -1,41 +1,66 @@
# -*- mode: ruby -*-
# vi: set ft=ruby :


Vagrant.configure("2") do |config|
config.vm.box = "bento/ubuntu-22.04"
config.vm.synced_folder ".", "/vagrant"

# Create a public network, which generally matched to bridged network.
# Bridged networks make the machine appear as another physical device on
# your network.
config.vm.network "public_network"
config.vm.hostname = "ewc-explorer-jupyterhub"

# Provider-specific configuration so you can fine-tune various
# backing providers for Vagrant. These expose provider-specific options.
# Example for VirtualBox:
#
config.vm.provider "virtualbox" do |vb|
# Customize the amount of memory on the VM:
vb.memory = 8096
vb.cpus = 4
end
config.vm.define "jupyter", primary: true do |jupyter|
config.vm.box = "bento/ubuntu-22.04"
config.vm.synced_folder ".", "/vagrant"

# Create a public network, which generally matched to bridged network.
# Bridged networks make the machine appear as another physical device on
# your network.
jupyter.vm.network "public_network"
jupyter.vm.hostname = "ewc-explorer-jupyterhub"

# Provider-specific configuration so you can fine-tune various
# backing providers for Vagrant. These expose provider-specific options.
# Example for VirtualBox:
#
jupyter.vm.provider "virtualbox" do |vb|
# Customize the amount of memory on the VM:
vb.memory = 8096
vb.cpus = 4
end

jupyter.vm.disk :disk, size: "20GB", name: "home2"
jupyter.vm.disk :disk, size: "50GB", name: "cache"

config.vm.disk :disk, size: "20GB", name: "home2"
config.vm.disk :disk, size: "50GB", name: "cache"
# Disable guest updates
jupyter.vbguest.auto_update = false
jupyter.vbguest.no_install = true

# Disable guest updates
config.vbguest.auto_update = false
config.vbguest.no_install = true
jupyter.vm.provision "ansible_local" do |ansible|
ansible.playbook = "vagrant-provision.yml"
ansible.become = true
end

config.vm.provision "ansible_local" do |ansible|
ansible.playbook = "vagrant-provision.yml"
ansible.become = true
jupyter.vm.provision "ansible_local" do |ansible|
ansible.playbook = "research-cloud-plugin.yml"
ansible.become = true
ansible.extra_vars = "research-cloud-plugin.vagrant.vars"
end
end

config.vm.provision "ansible_local" do |ansible|
ansible.playbook = "research-cloud-plugin.yml"
ansible.become = true
ansible.extra_vars = "research-cloud-plugin.vagrant.vars"
config.vm.define "fileserver", autostart: false do |fileserver|
fileserver.vm.box = "generic/ubuntu2004"
fileserver.vm.synced_folder ".", "/vagrant"

fileserver.vm.provider "virtualbox" do |vb|
# Customize the amount of memory on the VM:
vb.memory = 16096
vb.cpus = 4
end

fileserver.vm.disk :disk, size: "500GB", name: "volume_2"

# Disable guest updates
fileserver.vbguest.auto_update = false
fileserver.vbguest.no_install = true

fileserver.vm.provision "ansible_local" do |ansible|
ansible.playbook = "vagrant-provision-file-server.yml"
ansible.become = true
end
end
end
Loading
Loading