Skip to content

Commit

Permalink
Rewrite plugin for src cluster support
Browse files Browse the repository at this point in the history
  • Loading branch information
sverhoeven committed Mar 20, 2024
1 parent 06a5f87 commit 8d30bbb
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 18 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,14 @@ alt_home_location: /data/volume_3
# Vagrant user is instructor
# The students defined below can be used to login as a student
students: 'student1:pw1,student2:pw2'
worker_ip_addresses: []
```
The token can be found in the eWaterCycle password manager.
```shell
vagrant --version
# Vagrant 2.4.1
vagrant plugin install vagrant-vbguest
# Installed the plugin 'vagrant-vbguest (0.32.0)'
vagrant up
```

Expand Down
74 changes: 58 additions & 16 deletions research-cloud-plugin.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# code: language=ansible
- name: Install and configure eWaterCycle Jupyter on jumphost
- name: Configure workers
hosts:
- all
- localhost
gather_facts: false
vars:
# dCache token for mounting shared data
dcache_ro_token: null # Must be filled from command line
vars: {}
tasks:
# Heavily inspired by https://github.com/RS-DAT/JupyterDaskOnSRC/blob/main/research-cloud-plugin.yml
- name: Wait for system to become reachable

Check failure on line 10 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (wait_for_connection).

Check failure on line 10 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (wait_for_connection).
wait_for_connection:
timeout: 300
Expand All @@ -19,10 +18,42 @@
debug:
var: ansible_version

- name: Set up workers
debug:
var: worker_ip_addresses
- name: Alias name for jumphost

Check failure on line 21 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (add_host).

Check failure on line 21 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (add_host).
add_host:
name: jumphost
hostname: localhost

- name: Create group workers with workers' IPs

Check failure on line 26 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (add_host).

Check failure on line 26 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (add_host).
add_host:
name: "{{ item }}"
groups: workers
ansible_user: ubuntu
ansible_connection: ssh
ansible_ssh_private_key_file: ~/.ssh/id_rsa
ansible_become: yes

Check failure on line 33 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

yaml[truthy]

Truthy value should be one of \[false, true]

Check failure on line 33 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

yaml[truthy]

Truthy value should be one of \[false, true]
loop: '{{ worker_ip_addresses }}'

- name: Scan for worker keys.

Check failure on line 36 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

command-instead-of-shell

Use shell only when shell functionality is required.

Check failure on line 36 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (shell).

Check failure on line 36 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.

Check failure on line 36 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

command-instead-of-shell

Use shell only when shell functionality is required.

Check failure on line 36 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (shell).

Check failure on line 36 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

no-changed-when

Commands should not change things if nothing needs doing.
shell:
cmd: ssh-keyscan {{ item }}
register: ssh_scan
loop: '{{ worker_ip_addresses }}'

- name: Write the worker keys to known hosts

Check failure on line 42 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (known_hosts).

Check warning on line 42 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

jinja[spacing]

Jinja2 spacing could be improved: {{ ssh_scan.results | subelements('stdout_lines') | default([])}} -> {{ ssh_scan.results | subelements('stdout_lines') | default([]) }}

Check failure on line 42 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

fqcn[action-core]

Use FQCN for builtin module actions (known_hosts).

Check warning on line 42 in research-cloud-plugin.yml

View workflow job for this annotation

GitHub Actions / build

jinja[spacing]

Jinja2 spacing could be improved: {{ ssh_scan.results | subelements('stdout_lines') | default([])}} -> {{ ssh_scan.results | subelements('stdout_lines') | default([]) }}
known_hosts:
name: "{{ item.0.item }}"
key: "{{ item.1 }}"
loop:
"{{ ssh_scan.results | subelements('stdout_lines') | default([])}}"

- name: Install and configure eWaterCycle Jupyter on jumphost
hosts:
- default
- jumphost
vars:
# dCache token for mounting shared data
dcache_ro_token: null # Must be filled from command line
tasks:
- name: Common stuff
include_role:
name: common
Expand Down Expand Up @@ -73,15 +104,6 @@
include_role:
name: labstart

# https://explore.ewatercycle.org/ functionality
# - name: Experiment launcher
# include_role:
# name: launcher

# - name: Explorer
# include_role:
# name: terria

# https://jupyter.ewatercycle.org/ functionality
- name: Create eWaterCycle conda env
include_role:
Expand All @@ -104,3 +126,23 @@
debug:
msg: The eWaterCycle Jupyter plugin has completed


- name: Install Workers
hosts: workers
tasks:
- name: Hello
debug:
msg: "Hello from worker"


# TODO for cluster setup
# - Check if playbook works on vagrant, single src machine and src cluster
# - Check sram users exist on workers with same uid
# - Check network, workers should be able to reach jumphost but not accessable from internet
# - Check storage, only jumphost should have home and cache storage items mounted
# - Export homes and dcache and exchange dir over nfs to workers
# - Install apptainer/conda/nbgrader/ewatercycle on workers
# - use jupyterhub spawner to have each user end up on a different worker or jumphost
# - try ssh spawner with ssh keys+authorized_keys for each user on each machine
# - try slurm batch spawner with slurm installation
# - try dockerspawner.SystemUserSpawner with docker swarm installation

0 comments on commit 8d30bbb

Please sign in to comment.