From 312e9fd38fc587ef2ecc52a99c9c440aebb8f286 Mon Sep 17 00:00:00 2001 From: sverhoeven Date: Tue, 24 Sep 2024 12:03:51 +0200 Subject: [PATCH] Created component and dcache catalog item --- README.md | 55 +++++++++++------ SRC-ADMIN.md | 115 ------------------------------------ SRC-DEVEL.md | 163 +++++++++++++++++++++++++++++++++++++-------------- 3 files changed, 156 insertions(+), 177 deletions(-) delete mode 100644 SRC-ADMIN.md diff --git a/README.md b/README.md index 0e303071..3756d219 100644 --- a/README.md +++ b/README.md @@ -54,9 +54,9 @@ Previously the eWatercycle platform consisted of multiple VM on SURF HPC cloud, For developing the SURF Research Cloud applications locally you can use the [Vagrant instructions](VAGRANT.md) -## SURF Reseach cloud Catalog item registration +## SURF Reseach cloud catalog item registration -To register the eWaterCycle application on the SURF Research cloud, follow instructions in [SURF Research cloud developer document](SRC-DEVEL.md). +To register the eWaterCycle platform on the SURF Research cloud, follow instructions in [SURF Research cloud developer document](SRC-DEVEL.md). ## SURF Research cloud workspace @@ -66,13 +66,13 @@ This chapter is dedicated for application deployers. A workspace is name for a V The [eWatercycle system setup](https://ewatercycle.readthedocs.io/en/latest/system_setup.html) requires a lot of data files. -The shared data can come from 2 sources: -1. dcache, High capacity, but high latency storage accessible via WebDAV from anywhere on the internet. Usefull for research. -2. samba, A low capaciry, low latency file server that is only accessible from the private network of the SURF Research cloud. Usefull for teaching. +Two eWaterCycle catalog items have been created: +1. eWaterCycle dcache, uses dcache as shared data source. High capacity, but high latency storage accessible via WebDAV from anywhere on the Internet. Usefull for research. +2. eWaterCycle samba, uses samba as shared data source. A low capaciry, low latency file server that is only accessible from the private network of the SURF Research cloud. Usefull for teaching. -The shared data is mounted read-only `/data/shared` on the Jupyter machines. -In the following chapters you will need to make a choice which shared data source. -Depending on the choice you make you need to do certain things. +The shared data is mounted read-only `/data/shared` on the workspaces. +In the following chapters you will need to make choose which catalog item you want to use. +Depending on the choice, you need to do certain things. ### Preparations @@ -93,8 +93,9 @@ Before you can create a workspace several steps need to be done first. - This storage item should be used later in the Samba file server. 6. If shared data source is samba then create a private network - Name: `file-storage-network` -7. In Collaborative organizations +7. On https://portal.live.surfresearchcloud.nl/profile page in Collaborative organizations - Create a secret named `samba_password` and a strong random password as value + - Create a secret named `dcache_ro_token` and a dcache read-only token as value To become root on a VM the user needs to be member of the `src_co_admin` group on [SRAM](https://sram.surf.nl/). See [docs](https://servicedesk.surf.nl/wiki/display/WIKI/Workspace+roles%3A+Appoint+a+CO-member+a+SRC+administrator). @@ -119,23 +120,43 @@ Each collaborative organization should run a single file server. This file serve See [data documentation](DATA.md#populating-samba-file-server) on how to populate the file server. -### Workspace creation +### Workspace creation with dcache as stared data source Steps to create a eWaterCycle workspace: 1. Create a new workspace 2. Select collaborative organisation (CO) for example `ewatercycle-nlesc` -3. Select `eWaterCycle` catalog item +3. Select `eWaterCycle dcache` catalog item +4. Select size of VM (cpus/memory) based on use case +5. Select storage item for home directories. Remember item you picked as you will need it in the workspace parameters. +6. Select storage item for dcache cache. Remember item you picked as you will need it in the workspace parameters. +7. Fill **all** the workspace parameters. They should look something like + ![workspace-parameters](workspace-parameters.png) + - TODO update screenshot that has shared_data_source parameter +8. Wait for machine to be running +9. Visit URL/IP +10. When done delete machine + +End user should be invited to Collaborative organization in [SRAM](https://sram.surf.nl/) or [created as students](#students) so they can login. + +See [User guide](USER.md) to see what users have to do to login or use GitHub repository. + +### Workspace creation with samba as shared data source + +Steps to create a eWaterCycle workspace: + +1. Create a new workspace +2. Select collaborative organisation (CO) for example `ewatercycle-nlesc` +3. Select `eWaterCycle dcache` catalog item 4. Select size of VM (cpus/memory) based on use case 5. Select home storage item. Remember items you picked as you will need them in the workspace parameters. -6. If you do not have a Samba file server running then select the dcache cache storage item. -7. If you do have a Samba file server running then select the private network -8. Fill **all** the workspace parameters. They should look something like +6. Select the private network +7. Fill **all** the workspace parameters. They should look something like ![workspace-parameters](workspace-parameters.png) - TODO update screenshot that has shared_data_source parameter -9. Wait for machine to be running -10. Visit URL/IP -11. When done delete machine +8. Wait for machine to be running +9. Visit URL/IP +10. When done delete machine End user should be invited to Collaborative organization in [SRAM](https://sram.surf.nl/) or [created as students](#students) so they can login. diff --git a/SRC-ADMIN.md b/SRC-ADMIN.md deleted file mode 100644 index 9f49c77f..00000000 --- a/SRC-ADMIN.md +++ /dev/null @@ -1,115 +0,0 @@ -# SURF Research cloud developer - -This document is dedicated for catalog item developers. - -- [SURF Research cloud developer](#surf-research-cloud-developer) - - [Component registration](#component-registration) - - [Catalog item registration](#catalog-item-registration) - -A new workspace (aka Virtual Machine) can be made by choosing a catalog item. -A catalog item consists out of a list of components and other configuration. - -To register new components or catalog items in SURF Research cloud you -need to [appoint a developer](https://servicedesk.surf.nl/wiki/display/WIKI/Appoint+a+CO-member+a+developer). - -The generic steps to make your own catalog item are documented [here](https://servicedesk.surf.nl/wiki/display/WIKI/Create+your+own+catalog+items). - -## Component registration - -On [Components page](https://portal.live.surfresearchcloud.nl/catalog/components) -create a eWatercycle component with following specialization: - -- Use Ansible playbook as component script type - - Use `https://github.com/eWaterCycle/infra.git` as repository URL - - Use `research-cloud-plugin.yml` as script path - - Use `dcache-or-samba` as tag - - Name: eWaterCycle - - Subtitle: eWaterCycle teaching platform in a box - - Description: welcome page + jupyter + nbgrader + eWaterCycle python packages + dcache or samba - - Select cloud providers: - - SURF HPC Cloud, with all non-gpu sizes selected - - SURF HPC Cloud cluster, with all non-gpu sizes selected -- Component parameters, all fixed source type, required and overwitable unless otherwise stated - - shared_data_source: parameter for shared data source. - - default: dcache - - description: Source of shared data. Set to `dcache` or `samba`. TODO list which parameter are required for each source. - - dcache_ro_token: parameter for dcache read-only token aka macaroon. - The token can be found in the eWaterCycle password manager. - This token has an expiration date, so it needs to be updated every now and then. - - description: Macaroon with read permission for dcache - - alt_home_location: - - default: /data/volume_2 - - description: Path where home directories are stored. Set to `/data/`. - - rclone_cache_dir: - - default: /data/volume_3 - - description: Path where rclone cache is stored. Set to `/data/`. - - rclone_max_gsize: - - default: 45 - - description: For maximum size of cache on `rclone_cache_dir` volume. In Gb. - - grader_user: - - description: User who will be grading. User should be created on sram. This user will also be responsible for setting up the course and assignments. - - default: ==USERNAME== - (==USERNAME== which will be replaced by the actual username of the user creating the workspace) - - students: - - default: [] - - description: List of student user name and passwords. Format ':,:'. Use '' for no students. Use strong passwords as anyone on the internet can access the machine. - - course_repo: - - default: https://github.com/eWaterCycle/teaching.git - - description: Git repository url with the course source material. - - course_version - - description: The version, branch or tag of the course repository to use. - - default: nbgrader-quickstart - - samba_password: - - source_type: Co-Secret - - value: {"key": "samba_password","sensitive": 1} -- Set documentation URL to `https://github.com/eWaterCycle/infra` -- Do not allow every org to use this component. -- Select the organizations (CO) that are allowed to use the component. Data on the dcache should not be made public. - -## Catalog item registration - -On [Catalog items page](https://portal.live.surfresearchcloud.nl/catalog/catalogItems) -create an eWatercycle catalog item with following specialization: - -- Select the following components: - 1. SRC-OS - 2. SRC-CO - 3. SRC-Nginx - 4. SRC-External plugin - 5. eWaterCycle -- Set description fields: - - Name: eWaterCycle - - Subtitle: eWaterCycle teaching platform in a box - - Subtitle: eWaterCycle teaching platform in a box - - Description: welcome page + jupyter + nbgrader + eWaterCycle python packages + dcache or samba - - Logo: Organization avatar/logo on https://github.com/eWaterCycle -- Set documentation URL to `https://github.com/eWaterCycle/infra` -- Select the organizations (CO) that are allowed to use the catalog item. -- In cloud provider and settings step: - - Add `SURF HPC Cloud` as cloud provider - - Set Operating Systems to Ubuntu 22.04 - - Set Sizes to all non-gpu and non-disabled sizes -- In parameter settings step keep all values as is except - - Set `co_irods` to `false` as we do not use irods - - Set `co_research_drive` to `false` as we do not use research drive - - As interactive parameters expose following: - - shared_data_source: - - label: Shared data source - - description: Source of shared data. Set to `dcache` or `samba`. When samba is picked then you need to have a Samba server running inside the organization and filling rclone_cache_dir parameter is not needed. - - rclone_cache_dir: - - label: Rclone cache directory - - description: Path where rclone cache is stored. Set to `/data/`. - - alt_home_location: - - label: Homes path - - description: Path where home directories are stored. Set to `/data/`. - - grader_user: - - label: Username of grader - - description: User who will be grading. User should be created on sram. - - default: empty string - - students - - label: Students - - description: List of student user name and passwords. Format ':,:'. Use '' for no students. Use secure passwords as anyone on the internet can access the machine. -- Set boot disk size to 150Gb, - as default size will be mostly used by the conda environment and will trigger out of space warnings. -- Set workspace acces button behavior to `Webinterface (https:)`, - so clicking on `ACCESS` button will open up the eWatercycle experiment explorer web interface diff --git a/SRC-DEVEL.md b/SRC-DEVEL.md index fd86c520..c802dc6f 100644 --- a/SRC-DEVEL.md +++ b/SRC-DEVEL.md @@ -1,67 +1,145 @@ -## Catalog item registration +# SURF Research cloud developer -This chapter is dedicated for catalog item developers. +This chapter is dedicated for catalog item and component developers. -On the Research cloud the [developer](https://servicedesk.surf.nl/wiki/display/WIKI/Appoint+a+CO-member+a+developer) can add an catalog item for other people to use. -The generic steps to do this are documented [here](https://servicedesk.surf.nl/wiki/display/WIKI/Create+your+own+catalog+items). +- [SURF Research cloud developer](#surf-research-cloud-developer) + - [Component registration](#component-registration) + - [Catalog item registration](#catalog-item-registration) -For eWatercycle component following specialization was done +A new workspace (aka Virtual Machine) can be made by choosing a catalog item. +A catalog item consists out of a list of components and other configuration. -- Use Ansible playbook as component script type - - Use `https://github.com/eWaterCycle/infra.git` as repository URL - - Use `research-cloud-plugin.yml` as script path - - Use `dcache-or-samba` as tag - - Name: eWaterCycle +To register new catalog items in SURF Research cloud you +need to [appoint a developer](https://servicedesk.surf.nl/wiki/display/WIKI/Appoint+a+CO-member+a+developer). + +The generic steps to make your own catalog item are documented [here](https://servicedesk.surf.nl/wiki/display/WIKI/Create+your+own+catalog+items). + +## Component registration + +On [Components page](https://portal.live.surfresearchcloud.nl/catalog/components) +create a eWatercycle component with following specialization: + +- Component script + - Component script type: Ansible playbook + - Repository URL: https://github.com/eWaterCycle/infra.git + - Path: research-cloud-plugin.yml + - Tag: dcache-or-samba +- Name & description + - Name: eWaterCycle dache or samba - Subtitle: eWaterCycle teaching platform in a box - - Description: welcome page + jupyter + nbgrader + eWaterCycle python packages + dcache or samba - - Select cloud providers: - - SURF HPC Cloud, with all non-gpu sizes selected - - SURF HPC Cloud cluster, with all non-gpu sizes selected -- Component parameters, all fixed source type, required and overwitable unless otherwise stated - - shared_data_source: parameter for shared data source. - - default: dcache - - description: Source of shared data. Set to `dcache` or `samba`. TODO list which parameter are required for each source. + - Description: Welcome page + JupyterHub + nbgitpuller + nbgrader + eWaterCycle Python packages + dcache or samba + - Logo: Organization avatar/logo from https://github.com/eWaterCycle/ewatercycle +- Parameters, all configured parameters should be source type is fixed, required and overwitable unless otherwise stated + - shared_data_source: + - description: Source of shared data. Set to `dcache` or `samba`. + - initial value: dcache + - samba_password: + - source_type: Co-Secret + - overwritable: false + - initial value: {"key": "samba_password"} - dcache_ro_token: parameter for dcache read-only token aka macaroon. The token can be found in the eWaterCycle password manager. This token has an expiration date, so it needs to be updated every now and then. - - description: Macaroon with read permission for dcache - - alt_home_location: - - default: /data/volume_2 - - description: Path where home directories are stored. Set to `/data/`. + - source_type: Co-Secret + - description: Macaroon with read permission for dcache. + - initial value: {"key": "dcache_ro_token"} + - overwritable: false - rclone_cache_dir: - - default: /data/volume_3 - description: Path where rclone cache is stored. Set to `/data/`. - - rclone_max_gsize: - - default: 45 - - description: For maximum size of cache on `rclone_cache_dir` volume. In Gb. + - initial value: /data/volume_3 + - alt_home_location: + - description: Path where home directories are stored. Set to `/data/`. + - initial value: /data/volume_2 - grader_user: - description: User who will be grading. User should be created on sram. This user will also be responsible for setting up the course and assignments. - - default: ==USERNAME== + - initial value: ==USERNAME== (==USERNAME== which will be replaced by the actual username of the user creating the workspace) - students: - - default: [] - - description: List of student user name and passwords. Format ':,:'. Use '' for no students. Use strong passwords as anyone on the internet can access the machine. + - description: List of student user name and passwords. Format ':,:'. Use ' ' for no students. Use strong passwords as anyone on the internet can access the machine. + - initial value: ' ' (a space, as empty string make workspace creation form invalid) - course_repo: - - default: https://github.com/eWaterCycle/teaching.git - description: Git repository url with the course source material. + - initial value: https://github.com/eWaterCycle/teaching.git - course_version - description: The version, branch or tag of the course repository to use. - - default: nbgrader-quickstart - - samba_password: - - source_type: Co-Secret - - value: {"key": "samba_password","sensitive": 1} -- Set documentation URL to `https://github.com/eWaterCycle/infra` -- Do not allow every org to use this component. -- Select the organizations (CO) that are allowed to use the component. Data on the dcache should not be made public. + - initial value: nbgrader-quickstart +- Owner & support + - Owner: ewatercycle-nlesc + - Documentation URL: https://github.com/eWaterCycle/infra +- Access + - Allow every org to use this component. + +## Catalog item with dcache as shared data source + +On [Catalog items page](https://portal.live.surfresearchcloud.nl/catalog/catalogItems) +create an eWatercycle catalog item with following specialization: + +- Components, select the following components (use live version for all of them): + 1. SRC-OS + 2. SRC-CO + 3. SRC-Nginx + 4. SRC-External plugin + 5. eWaterCycle dache or samba +- Name & description + - Name: eWaterCycle dcache + - Subtitle: eWaterCycle teaching platform in a box + - Description: Welcome page + JupyterHub + nbgitpuller + nbgrader + eWaterCycle Python packages + dcache as shared data source + - Logo: Organization avatar/logo from https://github.com/eWaterCycle +- Owner & support + - Owner: ewatercycle-nlesc + - Documentation URL: https://github.com/eWaterCycle/infra +- Access, Select the organizations (CO) that are allowed to use the catalog item. + - Allowed Collaborative Organisations: Select all organizations with eWaterCycle in the name +- Cloud settings + - Add `SURF HPC Cloud` as cloud provider + - Operating Systems: Ubuntu 22.04 + - Sizes: all non-gpu and non-disabled sizes +- Parameters, keep all values as is except + - Set `co_irods` to `false` as we do not use irods + - Set `co_research_drive` to `false` as we do not use research drive + - Set `shared_data_source` to `dcache` + - As interactive parameters expose following: + - rclone_cache_dir: + - label: Rclone cache directory + - alt_home_location: + - label: Homes path + - grader_user: + - label: Username of grader + - students + - label: Students + - default: ' ' (a space, as empty string is not allowed) + - course_repo + - label: Course repository + - course_version + - label: Course version +- Workspace settings + - Set boot disk size to 50Gb, + as default size will be mostly used by the conda environment and will trigger out of space warnings. + - Set workspace acces button behavior to `Webinterface (https:)`, + so clicking on `ACCESS` button will open up the eWatercycle experiment explorer web interface + +## Catalog item with Samba as shared data source -For eWatercycle catalog item following specialization was done +On [Catalog items page](https://portal.live.surfresearchcloud.nl/catalog/catalogItems) +create an eWatercycle catalog item with following specialization: + +1. Find `eWaterCycle dcache` component item +2. Click on Actions -> Clone +3. Then re-configure the following + +TODO - Select the following components: 1. SRC-OS 2. SRC-CO 3. SRC-Nginx 4. SRC-External plugin - 5. eWaterCycle + 5. eWaterCycle dache or samba +- Set description fields: + - Name: eWaterCycle samba + - Subtitle: eWaterCycle teaching platform in a box + - Description: Welcome page + JupyterHub + nbgitpuller + nbgrader + eWaterCycle Python packages + samba as shared data source + - Logo: Organization avatar/logo from https://github.com/eWaterCycle - Set documentation URL to `https://github.com/eWaterCycle/infra` - Select the organizations (CO) that are allowed to use the catalog item. - In cloud provider and settings step: @@ -71,13 +149,8 @@ For eWatercycle catalog item following specialization was done - In parameter settings step keep all values as is except - Set `co_irods` to `false` as we do not use irods - Set `co_research_drive` to `false` as we do not use research drive + - Set `shared_data_source` to `samba` - As interactive parameters expose following: - - shared_data_source: - - label: Shared data source - - description: Source of shared data. Set to `dcache` or `samba`. When samba is picked then you need to have a Samba server running inside the organization and filling rclone_cache_dir parameter is not needed. - - rclone_cache_dir: - - label: Rclone cache directory - - description: Path where rclone cache is stored. Set to `/data/`. - alt_home_location: - label: Homes path - description: Path where home directories are stored. Set to `/data/`.