Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cloud rtd updates #1779

Merged
merged 11 commits into from
Aug 28, 2023
Merged

Feature/cloud rtd updates #1779

merged 11 commits into from
Aug 28, 2023

Conversation

HenryRWinterbottom
Copy link
Contributor

Description

This PR addresses issue #1701.

This PR contains updated documentation for the deployment of the global-workflow to the NOAA CSPs. The descriptions contained within are specific to the NOAA CSP AWS PW initiative but will be extended as more CSPs come online.
Type of change

Please delete options that are not relevant.

  • This change requires a documentation update

How Has This Been Tested?

The HTML generation results in the RTD pages appearing as expected.

Checklist

  • My code follows the style guidelines of this project
  • [] I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

@HenryRWinterbottom HenryRWinterbottom self-assigned this Aug 7, 2023
@HenryRWinterbottom HenryRWinterbottom added the documentation Improvements or additions to documentation label Aug 7, 2023
@HenryRWinterbottom
Copy link
Contributor Author

@aerorahul and @WalterKolczynski-NOAA , if you have not attempted the AWS deployment of the global-workflow, please use this draft PR as your guide. This will allow it to be further refined prior to the actual PR.

@WalterKolczynski-NOAA
Copy link
Contributor

I think this would read better if the images appeared after the associated instructions instead of before.

@HenryRWinterbottom
Copy link
Contributor Author

@WalterKolczynski-NOAA Good suggestion. I'll make the changes. Thank you.

@WalterKolczynski-NOAA
Copy link
Contributor

Few more things you may or may not be able to answer:

  • What's the difference between the (Vault) account and the normal one?
  • What's the difference between the two groups (ufs-cmd and ufs-cloud) (or is it just for cost tracking)?
  • You note the time to spin-up is variable, but would be useful to include it may be several minutes

Also, lmod isn't loaded (so I can't use module commands) and I'm not getting a /contrib/global-workflow or a /contrib/workflow:

[Walter.Kolczynski@awsdemo-7 ~]$ ls /contrib/
Jessica.Meixner  pw  Walter.Kolczynski
{
  "multi_user": false,
  "provider_version": "",
  "health_check": "# (optional) User-specific master node Health Check script - \n# you can use this run custom node Health Check logic  upon cluster start",
  "cluster_config": {
    "architecture": "amd64",
    "availability_zone": "us-east-1a",
    "controller_efa": false,
    "controller_image": "latest",
    "export_fs_type": "xfs",
    "image_disk_count": "1",
    "image_disk_name": "snap-04f8963f5d94148b6",
    "image_disk_size_gb": "200",
    "management_shape": "c4.8xlarge",
    "partition_config": [
      {
        "name": "compute",
        "instance_type": "c4.8xlarge",
        "max_node_num": "1",
        "elastic_image": "latest",
        "availability_zone": "us-east-1a",
        "default": "YES",
        "enable_spot": false,
        "efa": true,
        "capacity_reservation": false,
        "capacity_reservation_id": "",
        "placement_group": "",
        "architecture": "amd64"
      }
    ],
    "region": "us-east-1",
    "slurm_resume_timeout": "",
    "slurm_return_to_service": "",
    "slurm_suspend_time": "",
    "slurm_suspend_timeout": ""
  },
  "storages": [
    {
      "storage": "64b7f6e73779f42985fa0368",
      "mountPoint": "/lustre"
    }
  ]
}

@HenryRWinterbottom
Copy link
Contributor Author

What's the difference between the (Vault) account and the normal one?

I am not sure. I have been using the Vault account recently. This in an instance of the "moving target" scenario. This just appeared one day. It may have been discussed during one of the RDHCPS CSP office hours but I wasn't there or missed it.

What's the difference between the two groups (ufs-cmd and ufs-cloud) (or is it just for cost tracking)?

These are for cost tracking purposes for different groups. We have access to both of these accounts for development. However, the image from one will need to be set for the other (and vice-versa). I am going to work on that this sprint.

You note the time to spin-up is variable, but would be useful to include it may be several minutes.

Thanks, I will update the PR draft accordingly.

Also, lmod isn't loaded (so I can't use module commands) and I'm not getting a /contrib/global-workflow or a /contrib/workflow.

See https://noaa-emc.slack.com/archives/C029GPJBEHE/p1691521662156049

@WalterKolczynski-NOAA
Copy link
Contributor

WalterKolczynski-NOAA commented Aug 17, 2023

We were able to resolve the missing /contrib mounts with the following addition to the bootstrap:

ALLNODES
/contrib/pw/mount-epic-contrib.sh

if [[ $HOSTNAME == mgmt* ]]; then
  # head node only instructions
fi

This should be added to the instructions.

@aerorahul
Copy link
Contributor

Is this still a WIP?

@HenryRWinterbottom
Copy link
Contributor Author

@aerorahul Yes. If you have questions and/or something is not clear, please comment here. I will integrate the changes/updates accordingly.

@WalterKolczynski-NOAA
Copy link
Contributor

@HenryWinterbottom-NOAA Can we get this PR updated and through today while we wait for space for fix and IC files to be ready?

@HenryRWinterbottom
Copy link
Contributor Author

@WalterKolczynski-NOAA yes. Will do.

@github-actions
Copy link

Link to ReadTheDocs sample build for this PR can be found at:
https://global-workflow--1779.org.readthedocs.build/en/1779

@HenryRWinterbottom
Copy link
Contributor Author

I think this would read better if the images appeared after the associated instructions instead of before.

@WalterKolczynski-NOAA Can you specify the sections where you'd like me to rearrange the image placement?

@HenryRWinterbottom
Copy link
Contributor Author

We were able to resolve the missing /contrib mounts with the following addition to the bootstrap:

ALLNODES
/contrib/pw/mount-epic-contrib.sh

if [[ $HOSTNAME == mgmt* ]]; then
  # head node only instructions
fi

This should be added to the instructions.

I agree. But I am not going to add it until after we work out the bugs with compiling against that stack. At that point I will open a new issue.

@WalterKolczynski-NOAA
Copy link
Contributor

Sections 7.3 and 7.4

@github-actions
Copy link

Link to ReadTheDocs sample build for this PR can be found at:
https://global-workflow--1779.org.readthedocs.build/en/1779

@github-actions
Copy link

Link to ReadTheDocs sample build for this PR can be found at:
https://global-workflow--1779.org.readthedocs.build/en/1779

@HenryRWinterbottom
Copy link
Contributor Author

@HenryWinterbottom-NOAA Can we get this PR updated and through today while we wait for space for fix and IC files to be ready?

@WalterKolczynski-NOAA This is ready for review and/or conversion to a PR.

@WalterKolczynski-NOAA WalterKolczynski-NOAA marked this pull request as ready for review August 22, 2023 15:31
docs/source/noaa_csp.rst Outdated Show resolved Hide resolved
@github-actions
Copy link

Link to ReadTheDocs sample build for this PR can be found at:
https://global-workflow--1779.org.readthedocs.build/en/1779

@github-actions
Copy link

Link to ReadTheDocs sample build for this PR can be found at:
https://global-workflow--1779.org.readthedocs.build/en/1779

@github-actions
Copy link

Link to ReadTheDocs sample build for this PR can be found at:
https://global-workflow--1779.org.readthedocs.build/en/1779

@aerorahul
Copy link
Contributor

I will merge this by noon ET today, unless someone tells me there are more updates needed.

@HenryRWinterbottom
Copy link
Contributor Author

@aerorahul Sounds good to me. Thanks.

@aerorahul aerorahul merged commit 181d2e7 into NOAA-EMC:develop Aug 28, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants