Skip to content

olliefr/gke-auto-sandbox

Repository files navigation

Google Kubernetes Engine sandbox

Warning This is a research prototype. Think before you deploy 😈

This Terraform configuration deploys a sandbox for experimenting with GKE Autopilot private clusters.

Because it is meant for exploration and demos, some parts are configured differently from what you'd expect to see in a production system. The most prominent deviations are:

  • A lot of telemetry is collected. Logging and monitoring levels are set well above their default values.
  • All Google Cloud resources for the cluster are deployed directly from this Terraform module with no extra dependencies.
  • The latest versions of Terraform and Terraform Google provider are used.
  • Some resources are deployed using Google-beta provider.
  • Input validation is done on a "best-effort" basis.
  • No backwards compatibility should be expected.

You have been warned! It's good fun, though, so feel free to fork and play around with GKE, it's pretty cool tech, in my opinion.

Useful resources

GKE best practices and other related resources.

Architecture

Although this deployment is meant for proof-of-concept and experimental work, it implements many of the Google's cluster security recommendations.

  • It is a private cluster, so the cluster nodes do not have public IP addresses.
  • Cloud NAT is configured to allow the cluster nodes and pods to access the Internet. So container registries located outside Google Cloud can be used.
  • The cluster nodes use a user-managed least privilege service account.
  • The cluster is subscribed to the Rapid release channel.
  • VPC Flow Logs are enabled by default on the cluster subnetwork.

Some other aspects which used to be a thing when this sandbox was for deployment of Standard GKE clusters are now "pre-configured" by GKE Autopilot, but it's still useful to remember what they are:

Requirements

Permissions required to deploy

Given that this is a research prototype, I am not that fussy about scoping every admin role that is needed to deploy this module. The roles/owner IAM basic role on the project would work. The roles/editor IAM basic role might work but I have not tested it.

If you fancy doing it the hard way – and there is time and place for such adventures, indeed – I hope this starting list of roles will help:

  • Kubernetes Engine Admin (roles/container.admin)
  • Service Account Admin (roles/iam.serviceAccountAdmin)
  • Compute Admin (roles/compute.admin)
  • Service Usage Admin (roles/serviceusage.serviceUsageAdmin)
  • Monitoring Admin (roles/monitoring.admin)
  • Private Logs Viewer (roles/logging.privateLogViewer)
  • Moar?!

Quick start

Clone the repo and you are good to go! You can provide the input variables' values as command-line parameters to Terraform CLI:

terraform init && terraform apply -var="project=infernal-horse" -var="region=europe-west4"
  • You must set the Google Cloud project ID and Google Cloud region.
  • You may set authorized_networks to enable access ot the cluster's endpoint from a public IP address. You still would have to authenticate.

Note The default value for authorized_networks does not allows any public access to the cluster endpoint.

To avoid having to provide the input variable values on the command line, you can create a variable definitions file, such as env.auto.tfvars and define the values therein.

project = "<PROJECT_ID>"
region  = "<REGION>"

authorized_networks = [
  {
    cidr_block   = "1.2.3.4/32"
    display_name = "my-ip-address"
  },
]

Note that you'd have to provide your own values for the variables 😉

Note To find your public IP, you can run the following command

dig -4 TXT +short o-o.myaddr.l.google.com @ns1.google.com

Now you can run Terraform (init ... plan ... apply) to deploy.

Happy hacking! :shipit:

Input variables

This module accepts the following input variables.

  • project is the Google Cloud project ID.
  • region is the Google Cloud region for all deployed resources.
  • (Optional) VPC flow logs: enable_flow_log
  • (Optional) node_cidr_range
  • (Optional) pod_cidr_range
  • (Optional) service_cidr_range
  • (Optional) The list of authorized_networks representing CIDR blocks allowed to access the cluster's control plane.

Example workload

Warning This section is grossly out-of-date!

Once the infrastructure is provisioned with Terraform, you can deploy the example workload.

Code structure

Warning This section is grossly out-of-date!

This module runs in two stages, using two (aliased) instances of Terraform Google provider.

The first stage, named the seed, is self-contained in 010-seed.tf. It runs with user credentials via ADC and sets up the foundation for the deployment that follows. The required services are enabled at this stage, and a least privilege IAM service account is provisioned and configured. At the end of the seed stage, a second instance of Terraform Google provider is initialised with the service account's credentials.

The following stage deploys the cluster resources using service account impersonation.

This deployment architecture serves three aims:

  • Short feedback loop. Everything is contained in a single Terraform module so is simple to deploy and update.
  • Deploying using a least privilege service account. This reduces the risk of hitting a permission error on deployment into "production", which is usually done by a locked-down service account, as compared to deployment into "development" environment, which was done with user's Google account identity that usually has very broad permissions on the project (Owner or Editor). Inspiration.
  • The module can be used with "long-life" Google Cloud projects that are "repurposed" from one experiment to another. The explicit declaration of dependencies, where it was necessary, allows Terraform to destroy the resources in the right order, when requested.
# 000-versions: Terraform and provider versions and configuration
# 010-seed: configure the project and provision a least privilege service account for deploying the cluster
# 030-cluster-node-sa: provision and configure a least privilege service account for cluster nodes
# 040-network: create a VPC, a subnet, and configure network and firewall logs.
# 050-nat: resources that provide NAT functionality to cluster nodes with private IP addresses.
# 060-cluster: create a GKE cluster (Standard)

Future work

Just some ideas for future explorations.