Skip to content

DataBiosphere/terra-aws-resource-discovery

Repository files navigation

terra-aws-resource-discovery

Terra AWS Support Resource Discovery Client Library

Table of Contents

Introduction

In order for Terra services to manage and provide access to Controlled Resources in Amazon Web Services (AWS), there must exist several AWS cloud resources used for this purpose. These resources will be referred to as Support Resources.

Support resources may be Global or Regional:

  • Global Support Resources are resources that do not exist in an AWS Region. These are most commonly (but not limited to) IAM resources (IAM Roles, IAM Policies).
  • Regional Support Resources are resources that exist in a specific AWS Region, such as S3 storage buckets and KMS Keys.

This library provides the ability to discover all of the Support Resources needed by Terra services and present in a given AWS environment, so that Terra services can make use of these Support Resources to create and provide access to Terra Controlled Resources in AWS.

Support Resource Organization

Environments

In Terra, an Environment corresponds to a single AWS Account. All Support Resources and Controlled Resources exist in a single Environment.

The Environment class provides getters for all Global Support Resources usable by Terra Services.

Landing Zones

In Terra, a Landing Zone corresponds to the nexus of a Terra Environment and an AWS Region. Each Regional Support Resource exists within a Landing Zone, as do all Terra Controlled Resources.

The LandingZone class provides getters for all Regional Support Resources usable by Terra Services in a given AWS Region in a Terra AWS Environment.

Instances of class LandingZone are obtained by calling method getLandingZone() on an Environment instance, passing the corresponding AWS Region for the Landing Zone within the Environment.

The following diagram illustrates the relationship between Environments, Regions, and Terra Controlled Resources in AWS:

The area in the dark purple dashed box represents the resources (Support and Controlled) that make up a Terra AWS Landing Zone.

Discovery Implementation

Support Resource Deployment

Deployment of Support Resources in an AWS Environment is out of scope for this document. However, producers of these resources are required to provide discoverability of their Support Resources by using the conventions described below.

Configuration Schemas

We have chosen to use Apache Avro for specifying the schema for Support Resource discovery, for several reasons:

  • Strong schema evolution support.
  • Support for many programming languages, supporting both Java service development and infrastructure-deployment-side testing/validation in other languages.
  • Human readable JSON-based Schema IDL and JSON data files.

This repository (terra-aws-resource-discovery) will be the single source of truth for Resource Discovery Schemas.

Two configuration schemas are specified in this repository:

  • Environment.avsc - the schema used to describe all Global Support Resources available in a Terra AWS Environment.
  • LandingZone.avsc - the schema used to describe all Regional Support Resources available in a Landing Zone within a Terra AWS Environment.

Schema Evolution and Versioning

Avro always requires two versions of schema when deserializing:

  • The Writer's Schema (the current schema version used by the writer at the time the data was written)
  • The Reader's Schema (the current schema version used by the reader of the data)

Thus, when a configuration is written to storage, it will include both the version of the schema used to write the data (as retrieved from the artifacts of this library, published to GitHub with each release) as well as the data itself.

We will maintain Forward Transitive Compatibility as described in this document.

Specifically, these are the rules we will follow:

  • Any change to either Avro Schema file must incur at least a minor version bump.
  • Fields may be added to the schema without breaking forward transitive compatibility.
  • Optional fields may be deleted without breaking forward transitive compatibility.
  • Producers of configuration artifacts (backend infrastructure deployment) must be updated before consumers (Terra services).
  • Breaking changes must be a major version bump, and should be avoided. We will not support forward compatibility between major versions.

Configuration Storage Layout

The terra-aws-resource-discovery provides discovery of all Support Resources in a single Environment through interface EnvironmentDiscovery. Three implementations of this interface are provided:

  • Class S3EnvironmentDiscovery discovers Support Resources by reading them from an S3 bucket that the caller has access to.
  • Class FilesystemEnvironmentDiscovery discovers Support Resources by reading them from directories within an accessible file system path.
  • Class CachedEnvironmentDiscovery is used in conjunction with one of the two above classes to cache discovery results between calls to discoverEnvironment(), in order to reduce the number of calls to storage API's.

Whether stored in an S3 Bucket or a local file system directory, the following layout is expected by the discovery library (in this example, this is major version 1 of the library, and we are discovering an Environment with two Landing Zones in AWS regions eu-central-1 and us-east-1:

v1
├── v1/environment
│   ├── v1/environment/config.json
└── v1/landingzones
    ├── v1/landingzones/eu-central-1
    │   └── v1/landingzones/eu-central-1/config.json
    └── v1/landingzones/us-east-1
        └── v1/landingzones/us-east-1/config.json

Each config.json file contains the following JSON content:

{
  "schema": "<base64-encoded avro schema>",
  "payload": "<base64-encoded support resource data>"
}

File v1/environment/config.json uses schema Environment.avsc to describe the Global Support Resources in the Environment.

Files v1/landingzones/eu-central-1/config.json and v1/landingzones/eu-central-1/config.json use schema LandingZone.avsc to describe the Regional Support Resources in the Environment's Landing Zones in regions eu-central-1 and us-east-1 respectively.

Library Development Notes

Dependency Locking

We use Gradle's dependency locking to ensure that builds use the same transitive dependencies, so they're reproducible. This means that adding or updating a dependency requires telling Gradle to save the change. Execute the below command when any dependency versions are updated.

./gradlew dependencies --write-locks

Static Test Data

Class EnvironmentDiscoveryTestBase serves as a test fixture consuming static test data written in folder src/test/resources/test_discovery_data to allow for testing of Avro parsing and schema validation.

In order to update the static test schema files as schemas evolve (as well as create new test data) test authors can make use of the following scripts in the tools directory:

  • decode-test-data.sh creates an out-of-tree directory (mirroring the structure of the src/test/resources/test_discovery_data directory tree), but with the payload parsed into un-encoded JSON. Changes can be made in this out-of-tree location to the plaintext JSON and written back to the source tree using encode-test-data.sh script.
  • encode-test-data.sh can be used to write the payload changes made to an out-of-tree directory created with decode-test-data.sh back to the in-tree test configuration files in encoded form (along with the current in-tree schema versions).
  • parse-schema.sh parses the Base64-encoded Avro schema from a config.json Configuration file/object and prints it as plain JSON.
  • parse-payload.sh parses the Base64-encoded payload from a config.json Configuration file/object and prints it as plain JSON.
  • print-config.sh takes an Avro schema and payload data file (both in plain JSON) and Base64 encodes them into a configuration file format, printing the output to STDOUT

Updating Multiple Test Case Config Files

# Decode all test payloads into empty directory ~/DiscoveryTestData
$ ./tools/decode-test-data.sh src/test/resources/test_discovery_data /DiscoveryTestData

# The following unencoded JSON files mirror those in the src/test/resources/test_discovery_data
# directory.  These payload files can be edited in-place to update the test data payloads.
$ find /DiscoveryTestData/ -type file
/Users/jczerk/DiscoveryTestData//add_field_before_schema_update/v0/environment/payload.json
/DiscoveryTestData//notebook_lifecycle_mismatch/v0/environment/payload.json
/DiscoveryTestData//notebook_lifecycle_mismatch/v0/landingzones/us-east-1/payload.json
/DiscoveryTestData//no_landing_zones/v0/environment/payload.json
/DiscoveryTestData//validation/v0/environment/payload.json
/DiscoveryTestData//validation/v0/landingzones/us-west-1/payload.json
/DiscoveryTestData//validation/v0/landingzones/us-east-1/payload.json
/DiscoveryTestData//validation/v0/landingzones/fake-region/payload.json

# Now use the encode-test-data.sh script to encode the updated payloads (along with any
# in-tree schema updates from src/main/avro) into the in-tree test configuration files.
$ ./tools/encode-test-data.sh src/main/avro/ ~/DiscoveryTestData/ src/test/resources/test_discovery_data/

Updating a Single Existing Test Case Config File

# Make any changes to the Avro schema, in this case src/main/avro/Environment.avsc

# Choose the file that you wish to update
TEST_FILE="src/test/resources/test_discovery_data/validation/v0/environment/config.json"

# Parse the payload from the existing test data file and write it to a scratch file for editing
./tools/parse-payload.sh ${TEST_FILE} > /tmp/scratch.json

# Make any changes to the test payload to the scratch file directly

# Now write the updated schema and test data back to the original file
./tools/print-config.sh src/main/avro/Environment.avsc /tmp/scratch.json > ${TEST_FILE}

Writing a New Test Case Config File

# Write your test case payload to a new file somewhere outside of the terra-aws-resource-discovery
# filesystem tree (optionally making any required schema changes in src/main/avro)
NEW_TEST_DATA=/tmp/new_test.json

# Identify the new test data case location
NEW_TEST_CONFIG=src/test/resources/test_discovery_data/new_test_data/v0/environment/config.json

# Now write the schema and new test data to the new config file
./tools/print-config.sh src/main/avro/Environment.avsc ${NEW_TEST_DATA} > ${NEW_TEST_CONFIG}