Terra AWS Support Resource Discovery Client Library
In order for Terra services to manage and provide access to Controlled Resources in Amazon Web Services (AWS), there must exist several AWS cloud resources used for this purpose. These resources will be referred to as Support Resources.
Support resources may be Global or Regional:
- Global Support Resources are resources that do not exist in an AWS Region. These are most commonly (but not limited to) IAM resources (IAM Roles, IAM Policies).
- Regional Support Resources are resources that exist in a specific AWS Region, such as S3 storage buckets and KMS Keys.
This library provides the ability to discover all of the Support Resources needed by Terra services and present in a given AWS environment, so that Terra services can make use of these Support Resources to create and provide access to Terra Controlled Resources in AWS.
In Terra, an Environment corresponds to a single AWS Account. All Support Resources and Controlled Resources exist in a single Environment.
The Environment
class provides
getters for all Global Support Resources usable by Terra Services.
In Terra, a Landing Zone corresponds to the nexus of a Terra Environment and an AWS Region. Each Regional Support Resource exists within a Landing Zone, as do all Terra Controlled Resources.
The LandingZone
class provides
getters for all Regional Support Resources usable by Terra Services in a given AWS Region in a
Terra AWS Environment.
Instances of class LandingZone
are obtained by calling method getLandingZone()
on an Environment
instance, passing the corresponding AWS Region for the Landing Zone within the Environment.
The following diagram illustrates the relationship between Environments, Regions, and Terra Controlled Resources in AWS:
The area in the dark purple dashed box represents the resources (Support and Controlled) that make up a Terra AWS Landing Zone.
Deployment of Support Resources in an AWS Environment is out of scope for this document. However, producers of these resources are required to provide discoverability of their Support Resources by using the conventions described below.
We have chosen to use Apache Avro for specifying the schema for Support Resource discovery, for several reasons:
- Strong schema evolution support.
- Support for many programming languages, supporting both Java service development and infrastructure-deployment-side testing/validation in other languages.
- Human readable JSON-based Schema IDL and JSON data files.
This repository (terra-aws-resource-discovery
)
will be the single source of truth for Resource Discovery Schemas.
Two configuration schemas are specified in this repository:
Environment.avsc
- the schema used to describe all Global Support Resources available in a Terra AWS Environment.LandingZone.avsc
- the schema used to describe all Regional Support Resources available in a Landing Zone within a Terra AWS Environment.
Avro always requires two versions of schema when deserializing:
- The Writer's Schema (the current schema version used by the writer at the time the data was written)
- The Reader's Schema (the current schema version used by the reader of the data)
Thus, when a configuration is written to storage, it will include both the version of the schema used to write the data (as retrieved from the artifacts of this library, published to GitHub with each release) as well as the data itself.
We will maintain Forward Transitive Compatibility as described in this document.
Specifically, these are the rules we will follow:
- Any change to either Avro Schema file must incur at least a minor version bump.
- Fields may be added to the schema without breaking forward transitive compatibility.
- Optional fields may be deleted without breaking forward transitive compatibility.
- Producers of configuration artifacts (backend infrastructure deployment) must be updated before consumers (Terra services).
- Breaking changes must be a major version bump, and should be avoided. We will not support forward compatibility between major versions.
The terra-aws-resource-discovery
provides discovery of all Support Resources in a single
Environment through interface EnvironmentDiscovery
.
Three implementations of this interface are provided:
- Class
S3EnvironmentDiscovery
discovers Support Resources by reading them from an S3 bucket that the caller has access to. - Class
FilesystemEnvironmentDiscovery
discovers Support Resources by reading them from directories within an accessible file system path. - Class
CachedEnvironmentDiscovery
is used in conjunction with one of the two above classes to cache discovery results between calls todiscoverEnvironment()
, in order to reduce the number of calls to storage API's.
Whether stored in an S3 Bucket or a local file system directory, the following layout is expected
by the discovery library (in this example, this is major version 1 of the library, and we are
discovering an Environment with two Landing Zones in AWS regions eu-central-1
and us-east-1
:
v1
├── v1/environment
│ ├── v1/environment/config.json
└── v1/landingzones
├── v1/landingzones/eu-central-1
│ └── v1/landingzones/eu-central-1/config.json
└── v1/landingzones/us-east-1
└── v1/landingzones/us-east-1/config.json
Each config.json
file contains the following JSON content:
{
"schema": "<base64-encoded avro schema>",
"payload": "<base64-encoded support resource data>"
}
File v1/environment/config.json
uses schema Environment.avsc
to describe the Global Support Resources in the Environment.
Files v1/landingzones/eu-central-1/config.json
and v1/landingzones/eu-central-1/config.json
use
schema LandingZone.avsc
to describe the Regional Support
Resources in the Environment's Landing Zones in regions eu-central-1
and us-east-1
respectively.
We use Gradle's dependency locking to ensure that builds use the same transitive dependencies, so they're reproducible. This means that adding or updating a dependency requires telling Gradle to save the change. Execute the below command when any dependency versions are updated.
./gradlew dependencies --write-locks
Class EnvironmentDiscoveryTestBase
serves as a test fixture consuming static test data written in folder
src/test/resources/test_discovery_data
to allow for testing of Avro parsing and schema validation.
In order to update the static test schema files as schemas evolve (as well as create new test data)
test authors can make use of the following scripts in the tools
directory:
decode-test-data.sh
creates an out-of-tree directory (mirroring the structure of thesrc/test/resources/test_discovery_data
directory tree), but with the payload parsed into un-encoded JSON. Changes can be made in this out-of-tree location to the plaintext JSON and written back to the source tree usingencode-test-data.sh
script.encode-test-data.sh
can be used to write the payload changes made to an out-of-tree directory created withdecode-test-data.sh
back to the in-tree test configuration files in encoded form (along with the current in-tree schema versions).parse-schema.sh
parses the Base64-encoded Avro schema from aconfig.json
Configuration file/object and prints it as plain JSON.parse-payload.sh
parses the Base64-encoded payload from aconfig.json
Configuration file/object and prints it as plain JSON.print-config.sh
takes an Avro schema and payload data file (both in plain JSON) and Base64 encodes them into a configuration file format, printing the output to STDOUT
# Decode all test payloads into empty directory ~/DiscoveryTestData
$ ./tools/decode-test-data.sh src/test/resources/test_discovery_data /DiscoveryTestData
# The following unencoded JSON files mirror those in the src/test/resources/test_discovery_data
# directory. These payload files can be edited in-place to update the test data payloads.
$ find /DiscoveryTestData/ -type file
/Users/jczerk/DiscoveryTestData//add_field_before_schema_update/v0/environment/payload.json
/DiscoveryTestData//notebook_lifecycle_mismatch/v0/environment/payload.json
/DiscoveryTestData//notebook_lifecycle_mismatch/v0/landingzones/us-east-1/payload.json
/DiscoveryTestData//no_landing_zones/v0/environment/payload.json
/DiscoveryTestData//validation/v0/environment/payload.json
/DiscoveryTestData//validation/v0/landingzones/us-west-1/payload.json
/DiscoveryTestData//validation/v0/landingzones/us-east-1/payload.json
/DiscoveryTestData//validation/v0/landingzones/fake-region/payload.json
# Now use the encode-test-data.sh script to encode the updated payloads (along with any
# in-tree schema updates from src/main/avro) into the in-tree test configuration files.
$ ./tools/encode-test-data.sh src/main/avro/ ~/DiscoveryTestData/ src/test/resources/test_discovery_data/
# Make any changes to the Avro schema, in this case src/main/avro/Environment.avsc
# Choose the file that you wish to update
TEST_FILE="src/test/resources/test_discovery_data/validation/v0/environment/config.json"
# Parse the payload from the existing test data file and write it to a scratch file for editing
./tools/parse-payload.sh ${TEST_FILE} > /tmp/scratch.json
# Make any changes to the test payload to the scratch file directly
# Now write the updated schema and test data back to the original file
./tools/print-config.sh src/main/avro/Environment.avsc /tmp/scratch.json > ${TEST_FILE}
# Write your test case payload to a new file somewhere outside of the terra-aws-resource-discovery
# filesystem tree (optionally making any required schema changes in src/main/avro)
NEW_TEST_DATA=/tmp/new_test.json
# Identify the new test data case location
NEW_TEST_CONFIG=src/test/resources/test_discovery_data/new_test_data/v0/environment/config.json
# Now write the schema and new test data to the new config file
./tools/print-config.sh src/main/avro/Environment.avsc ${NEW_TEST_DATA} > ${NEW_TEST_CONFIG}