Skip to content

Latest commit

 

History

History
148 lines (119 loc) · 14.7 KB

File metadata and controls

148 lines (119 loc) · 14.7 KB

Description

This module creates a login node for a Slurm cluster based on the SchedMD/slurm-gcp slurm_instance_template and slurm_login_instance terraform modules. The login node is used in conjunction with the Slurm controller.

Example

- id: slurm_login
  source: community/modules/scheduler/schedmd-slurm-gcp-v5-login
  use:
  - network1
  - slurm_controller
  settings:
    machine_type: n2-standard-4

This creates a Slurm login node which is:

  • connected to the primary subnet of network1 via use
  • associated with the slurm_controller module as the slurm controller via use
  • of VM machine type n2-standard-4

Custom Images

For more information on creating valid custom images for the login node VM instances or for custom instance templates, see our vm-images.md documentation page.

GPU Support

More information on GPU support in Slurm on GCP and other Cluster Toolkit modules can be found at docs/gpu-support.md

Support

The Cluster Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.

License

Copyright 2023 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

Name Version
terraform >= 1.1
google >= 3.83

Providers

Name Version
google >= 3.83

Modules

Name Source Version
slurm_login_instance github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_login_instance 5.12.0
slurm_login_template github.com/GoogleCloudPlatform/slurm-gcp.git//terraform/slurm_cluster/modules/slurm_instance_template 5.12.0

Resources

Name Type
google_compute_default_service_account.default data source
google_compute_image.slurm data source

Inputs

Name Description Type Default Required
access_config Access configurations, i.e. IPs via which the VM instance can be accessed via the Internet.
list(object({
nat_ip = string
network_tier = string
}))
[] no
additional_disks List of maps of disks.
list(object({
disk_name = string
device_name = string
disk_type = string
disk_size_gb = number
disk_labels = map(string)
auto_delete = bool
boot = bool
}))
[] no
allow_automatic_updates If false, disables automatic system package updates on the created instances. This feature is
only available on supported images (or images derived from them). For more details, see
https://cloud.google.com/compute/docs/instances/create-hpc-vm#disable_automatic_updates
bool true no
can_ip_forward Enable IP forwarding, for NAT instances for example. bool false no
controller_instance_id The server-assigned unique identifier of the controller instance. This value
must be supplied as an output of the controller module, typically via use.
string n/a yes
deployment_name Name of the deployment. string n/a yes
disable_login_public_ips If set to false. The login will have a random public IP assigned to it. Ignored if access_config is set. bool true no
disable_smt Disables Simultaneous Multi-Threading (SMT) on instance. bool true no
disk_auto_delete Whether or not the boot disk should be auto-deleted. bool true no
disk_labels Labels specific to the boot disk. These will be merged with var.labels. map(string) {} no
disk_size_gb Boot disk size in GB. number 50 no
disk_type Boot disk type. string "pd-standard" no
enable_confidential_vm Enable the Confidential VM configuration. Note: the instance image must support option. bool false no
enable_oslogin Enables Google Cloud os-login for user login and authentication for VMs.
See https://cloud.google.com/compute/docs/oslogin
bool true no
enable_reconfigure Enables automatic Slurm reconfigure on when Slurm configuration changes (e.g.
slurm.conf.tpl, partition details).

NOTE: Requires Google Pub/Sub API.
bool false no
enable_shielded_vm Enable the Shielded VM configuration. Note: the instance image must support option. bool false no
gpu DEPRECATED: use var.guest_accelerator
object({
type = string
count = number
})
null no
guest_accelerator List of the type and count of accelerator cards attached to the instance.
list(object({
type = string,
count = number
}))
[] no
instance_image Defines the image that will be used in the Slurm login node VM instances.

Expected Fields:
name: The name of the image. Mutually exclusive with family.
family: The image family to use. Mutually exclusive with name.
project: The project where the image is hosted.

For more information on creating custom images that comply with Slurm on GCP
see the "Slurm on GCP Custom Images" section in docs/vm-images.md.
map(string)
{
"family": "slurm-gcp-5-12-hpc-centos-7",
"project": "schedmd-slurm-public"
}
no
instance_image_custom A flag that designates that the user is aware that they are requesting
to use a custom and potentially incompatible image for this Slurm on
GCP module.

If the field is set to false, only the compatible families and project
names will be accepted. The deployment will fail with any other image
family or name. If set to true, no checks will be done.

See: https://goo.gle/hpc-slurm-images
bool false no
instance_template Self link to a custom instance template. If set, other VM definition
variables such as machine_type and instance_image will be ignored in favor
of the provided instance template.

For more information on creating custom images for the instance template
that comply with Slurm on GCP see the "Slurm on GCP Custom Images" section
in docs/vm-images.md.
string null no
labels Labels, provided as a map. map(string) {} no
machine_type Machine type to create. string "n2-standard-2" no
metadata Metadata, provided as a map. map(string) {} no
min_cpu_platform Specifies a minimum CPU platform. Applicable values are the friendly names of
CPU platforms, such as Intel Haswell or Intel Skylake. See the complete list:
https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform
string null no
network_ip DEPRECATED: Use static_ips variable to assign an internal static ip address. string null no
network_self_link Network to deploy to. Either network_self_link or subnetwork_self_link must be specified. string null no
num_instances Number of instances to create. This value is ignored if static_ips is provided. number 1 no
on_host_maintenance Instance availability Policy. string "MIGRATE" no
preemptible Allow the instance to be preempted. bool false no
project_id Project ID to create resources in. string n/a yes
pubsub_topic The cluster pubsub topic created by the controller when enable_reconfigure=true. string null no
region Region where the instances should be created.
Note: region will be ignored if it can be extracted from subnetwork.
string null no
service_account Service account to attach to the login instance. If not set, the
default compute service account for the given project will be used with the
"https://www.googleapis.com/auth/cloud-platform" scope.
object({
email = string
scopes = set(string)
})
null no
shielded_instance_config Shielded VM configuration for the instance. Note: not used unless
enable_shielded_vm is 'true'.
- enable_integrity_monitoring : Compare the most recent boot measurements to the
integrity policy baseline and return a pair of pass/fail results depending on
whether they match or not.
- enable_secure_boot : Verify the digital signature of all boot components, and
halt the boot process if signature verification fails.
- enable_vtpm : Use a virtualized trusted platform module, which is a
specialized computer chip you can use to encrypt objects like keys and
certificates.
object({
enable_integrity_monitoring = bool
enable_secure_boot = bool
enable_vtpm = bool
})
{
"enable_integrity_monitoring": true,
"enable_secure_boot": true,
"enable_vtpm": true
}
no
slurm_cluster_name Cluster name, used for resource naming and slurm accounting. If not provided it will default to the first 8 characters of the deployment name (removing any invalid characters). string null no
source_image DEPRECATED: Use instance_image instead. string null no
source_image_family DEPRECATED: Use instance_image instead. string null no
source_image_project DEPRECATED: Use instance_image instead. string null no
startup_script Startup script that will be used by the login node VM. string "" no
static_ips List of static IPs for VM instances. list(string) [] no
subnetwork_project The project that subnetwork belongs to. string null no
subnetwork_self_link Subnet to deploy to. Either network_self_link or subnetwork_self_link must be specified. string null no
tags Network tag list. list(string) [] no
zone Zone where the instances should be created. If not specified, instances will be
spread across available zones in the region.
string null no

Outputs

No outputs.