Skip to content

Commit

Permalink
Merge pull request #9 from truefoundry/node-pool
Browse files Browse the repository at this point in the history
Adding default node pool
  • Loading branch information
dunefro authored Mar 11, 2024
2 parents 3c90bac + aec8f4c commit 9b1df91
Show file tree
Hide file tree
Showing 5 changed files with 145 additions and 29 deletions.
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,26 @@ Truefoundry Azure Cluster Module
| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.4 |
| <a name="requirement_azurerm"></a> [azurerm](#requirement\_azurerm) | 3.89.0 |
| <a name="requirement_azurerm"></a> [azurerm](#requirement\_azurerm) | 3.94.0 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_azurerm"></a> [azurerm](#provider\_azurerm) | 3.89.0 |
| <a name="provider_azurerm"></a> [azurerm](#provider\_azurerm) | 3.94.0 |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_aks"></a> [aks](#module\_aks) | Azure/aks/azurerm | 7.5.0 |
| <a name="module_aks"></a> [aks](#module\_aks) | Azure/aks/azurerm | 8.0.0 |

## Resources

| Name | Type |
|------|------|
| [azurerm_role_assignment.network_contributor_cluster](https://registry.terraform.io/providers/hashicorp/azurerm/3.89.0/docs/resources/role_assignment) | resource |
| [azurerm_user_assigned_identity.cluster](https://registry.terraform.io/providers/hashicorp/azurerm/3.89.0/docs/resources/user_assigned_identity) | resource |
| [azurerm_role_assignment.network_contributor_cluster](https://registry.terraform.io/providers/hashicorp/azurerm/3.94.0/docs/resources/role_assignment) | resource |
| [azurerm_user_assigned_identity.cluster](https://registry.terraform.io/providers/hashicorp/azurerm/3.94.0/docs/resources/user_assigned_identity) | resource |

## Inputs

Expand All @@ -42,10 +42,12 @@ Truefoundry Azure Cluster Module
| <a name="input_enable_file_driver"></a> [enable\_file\_driver](#input\_enable\_file\_driver) | Enable file storage provider | `bool` | `true` | no |
| <a name="input_enable_snapshot_controller"></a> [enable\_snapshot\_controller](#input\_enable\_snapshot\_controller) | Enable snapshot controller | `bool` | `true` | no |
| <a name="input_enable_storage_profile"></a> [enable\_storage\_profile](#input\_enable\_storage\_profile) | Enable storage profile for the cluster. If disabled `enable_blob_driver`, `enable_file_driver`, `enable_disk_driver` and `enable_snapshot_controller` will have no impact | `bool` | `true` | no |
| <a name="input_initial_node_pool_max_surge"></a> [initial\_node\_pool\_max\_surge](#input\_initial\_node\_pool\_max\_surge) | Max surge in percentage for the intial node pool | `string` | `"10"` | no |
| <a name="input_intial_node_pool_instance_type"></a> [intial\_node\_pool\_instance\_type](#input\_intial\_node\_pool\_instance\_type) | Instance size of the initial node pool | `string` | `"Standard_D2s_v5"` | no |
| <a name="input_intial_node_pool_spot_instance_type"></a> [intial\_node\_pool\_spot\_instance\_type](#input\_intial\_node\_pool\_spot\_instance\_type) | Instance size of the initial node pool | `string` | `"Standard_D4s_v5"` | no |
| <a name="input_kubernetes_version"></a> [kubernetes\_version](#input\_kubernetes\_version) | Version of the kubernetes engine | `string` | `"1.28"` | no |
| <a name="input_location"></a> [location](#input\_location) | Location of the resource group | `string` | n/a | yes |
| <a name="input_max_pods_per_node"></a> [max\_pods\_per\_node](#input\_max\_pods\_per\_node) | Max pods per node | `number` | `32` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of the cluster | `string` | n/a | yes |
| <a name="input_network_plugin"></a> [network\_plugin](#input\_network\_plugin) | Network plugin to use for cluster | `string` | `"kubenet"` | no |
| <a name="input_oidc_issuer_enabled"></a> [oidc\_issuer\_enabled](#input\_oidc\_issuer\_enabled) | Enable OIDC for the cluster | `bool` | `true` | no |
Expand Down
16 changes: 9 additions & 7 deletions aks.tf
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ resource "azurerm_role_assignment" "network_contributor_cluster" {

module "aks" {
source = "Azure/aks/azurerm"
version = "7.5.0"
version = "8.0.0"
resource_group_name = var.resource_group_name
cluster_name = var.name
location = var.location
Expand All @@ -26,12 +26,14 @@ module "aks" {
agents_labels = {
"truefoundry" : "essential"
}
agents_count = local.intial_node_pool_min_count
agents_max_count = local.intial_node_pool_max_count
agents_min_count = local.intial_node_pool_min_count
agents_pool_name = "initial"
agents_size = var.intial_node_pool_instance_type
agents_tags = local.tags
agents_count = local.intial_node_pool_min_count
agents_max_count = local.intial_node_pool_max_count
agents_min_count = local.intial_node_pool_min_count
agents_pool_name = "initial"
agents_size = var.intial_node_pool_instance_type
agents_max_pods = var.max_pods_per_node
agents_pool_max_surge = var.initial_node_pool_max_surge
agents_tags = local.tags

orchestrator_version = coalesce(var.orchestrator_version, var.kubernetes_version)

Expand Down
130 changes: 114 additions & 16 deletions locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,132 @@ locals {
"terraform-module" = "terraform-azure-truefoundry-cluster"
"terraform" = "true"
"cluster-name" = var.name
"truefoundry" = "managed"
},
var.tags
)
intial_node_pool_min_count = var.control_plane ? 2 : 1
intial_node_pool_max_count = var.control_plane ? 3 : 2
node_pools = {
spot = {
name = "spotpool"
node_count = 1
max_count = 20
min_count = 1
os_disk_size_gb = 100
priority = "Spot"
vm_size = var.intial_node_pool_spot_instance_type

# mandatory to pass otherwise node pool will be recreated
cpupools = [
{
"name" = "cpu"
"vm_size" = "Standard_D4ds_v5"
},
{
"name" = "cpu2x"
"vm_size" = "Standard_D8ds_v5"
}
]
gpupools = [
{
name = "a100"
vm_size = "Standard_NC24ads_A100_v4"
},
{
name = "a100x2"
vm_size = "Standard_NC48ads_A100_v4"
},
{
name = "a100x4"
vm_size = "Standard_NC96ads_A100_v4"
},
{
name = "a10"
vm_size = "Standard_NV6ads_A10_v5"
},
{
name = "a10x2"
vm_size = "Standard_NV12ads_A10_v5"
},
{
name = "a10x3"
vm_size = "Standard_NV18ads_A10_v5"
},
{
name = "a10x6"
vm_size = "Standard_NV36ads_A10_v5"
},
{
name = "t4"
vm_size = "Standard_NC4as_T4_v3"
},
{
name = "t4x2"
vm_size = "Standard_NC8as_T4_v3"
},
{
name = "t4x4"
vm_size = "Standard_NC16as_T4_v3"
},
{
name = "t4x16"
vm_size = "Standard_NC64as_T4_v3"
}
]
node_pools = merge({ for k, v in local.cpupools : "${v["name"]}sp" => {
name = "${v["name"]}sp"
node_count = 0
max_count = 20
min_count = 0
os_disk_size_gb = 100
priority = "Spot"
vm_size = v["vm_size"]
enable_auto_scaling = true
custom_ca_trust_enabled = false
enable_host_encryption = true
enable_node_public_ip = false
eviction_policy = "Delete"
orchestrator_version = var.kubernetes_version
node_taints = [
"kubernetes.azure.com/scalesetpriority=spot:NoSchedule"
]
tags = local.tags
zones = []
vnet_subnet_id = var.subnet_id
max_pods = var.max_pods_per_node
} },
{ for k, v in local.gpupools : "${v["name"]}sp" => {
name = "${v["name"]}sp"
node_count = 0
max_count = 20
min_count = 0
os_disk_size_gb = 100
priority = "Spot"
vm_size = v["vm_size"]
enable_auto_scaling = true
custom_ca_trust_enabled = false
enable_host_encryption = false
enable_host_encryption = true
enable_node_public_ip = false
eviction_policy = "Delete"
orchestrator_version = var.kubernetes_version
node_taints = [
"kubernetes.azure.com/scalesetpriority=spot:NoSchedule"
"kubernetes.azure.com/scalesetpriority=spot:NoSchedule",
"nvidia.com/gpu=Present:NoSchedule"
]
tags = local.tags
zones = []
vnet_subnet_id = var.subnet_id
}
}
}
max_pods = var.max_pods_per_node
} },
{ for k, v in local.gpupools : "${v["name"]}" => {
name = "${v["name"]}"
node_count = 0
max_count = 20
min_count = 0
os_disk_size_gb = 100
priority = "Regular"
vm_size = v["vm_size"]
enable_auto_scaling = true
custom_ca_trust_enabled = false
enable_host_encryption = true
enable_node_public_ip = false
orchestrator_version = var.kubernetes_version
node_taints = [
"nvidia.com/gpu=Present:NoSchedule"
]
tags = local.tags
zones = []
vnet_subnet_id = var.subnet_id
max_pods = var.max_pods_per_node
} })
}
14 changes: 14 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,12 @@ variable "intial_node_pool_spot_instance_type" {
type = string
}

variable "initial_node_pool_max_surge" {
description = "Max surge in percentage for the intial node pool"
type = string
default = "10"
}

variable "workload_identity_enabled" {
description = "Enable workload identity in the cluster"
default = true
Expand Down Expand Up @@ -83,11 +89,19 @@ variable "disk_driver_version" {
type = string
default = "v1"
}

variable "enable_snapshot_controller" {
description = "Enable snapshot controller"
type = bool
default = true
}

variable "max_pods_per_node" {
description = "Max pods per node"
type = number
default = 32
}

################################################################################
# Network
################################################################################
Expand Down
2 changes: 1 addition & 1 deletion versions.tf
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "3.89.0"
version = "3.94.0"
}
}
}

0 comments on commit 9b1df91

Please sign in to comment.