Skip to content

Terraform module for terraform-aws-tamr-config

License

Notifications You must be signed in to change notification settings

Datatamer/terraform-aws-tamr-config

Repository files navigation

Terraform-generated Tamr Config Module

This terraform module automates populating some Tamr config variables that are generated as outputs from other AWS scale-out modules.

Examples

Minimal

Smallest complete fully working example. This example might require extra resources to run the example.

Resources Created

This module creates:

  • A template_file data source which renders the contents of a populated Tamr config.
  • If rendered_config_path is provided, the populated Tamr config will be output to a yaml file in this path.

Requirements

Name Version
terraform >= 0.13

Providers

Name Version
local n/a
template n/a

Inputs

Name Description Type Default Required
ephemeral_spark_configured True if EMR was configured for ephemeral spark clusters. bool n/a yes
es_domain_endpoint Endpoint of Elasticsearch domain. string n/a yes
rds_pg_hostname Hostname of RDS postgres instance. string n/a yes
rds_pg_password Master password for RDS postgres database instance. string n/a yes
spark_cluster_log_uri The path to the S3 location where logs for the Spark cluster are stored. string n/a yes
tamr_data_bucket Name of Tamr root directory bucket. string n/a yes
additional_templated_variables Mapping of additional Tamr variables (not included in template) to its value. If a variable name in this map defines the same key as an input variable, the value specified in this map takes precedence. map(string) {} no
apps_dms_default_cloud_provider Defines the default cloud service provider for DMS when APPS_DMS_ENABLED is set to true string "s3" no
apps_dms_enabled Set to true to enable the Data Movement Service (DMS) bool true no
config_template_path Path to Tamr config template. string "./tamr-config.yml" no
core_ebs_size The core EBS volume size, in gibibytes (GiB). string "" no
core_ebs_type Type of volumes to attach to the core nodes. Valid options are gp2, io1, standard and st1. string "" no
core_ebs_volumes_count Number of volumes to attach to the core nodes. string "" no
core_group_instance_count Number of Amazon EC2 instances used to execute the job flow. string "" no
core_instance_type The EC2 instance type of the core nodes. string "" no
emr_additional_core_sg_id Security group ID of the EMR Additional Core Security Group. string "" no
emr_additional_master_sg_id Security group ID of the EMR Additional Master Security Group. string "" no
emr_cluster_name_prefix A prefix to add to the name of created EMR Spark clusters string "tamr-emr-" no
emr_instance_profile_name Name of instance profile for EMR EC2 instances. string "" no
emr_key_pair_name Name of the Key Pair that will be attached to the EMR EC2 instances. string "" no
emr_managed_core_sg_id Security group ID of the EMR Managed Core Security Group. string "" no
emr_managed_master_sg_id Security group ID of the EMR Managed Master Security Group. string "" no
emr_release_label The release label for the Amazon EMR release. string "emr-5.29.0" no
emr_root_volume_size The size, in GiB, of the EBS root device volume of the Linux AMI that is used for each EMR EC2 instance. string "10" no
emr_service_access_sg_id Security group ID of EMR Service Access Security Group. string "" no
emr_service_role_name Name of IAM service role for EMR cluster. string "" no
emr_subnet_id ID of the subnet where the EMR cluster will be created. string "" no
emr_tags Map of tags to add to new resources in EMR map(string) {} no
emrfs_dynamodb_table_name Name for the EMRFS DynamoDB table. string "" no
es_enabled Whether or not to enable Elasticsearch by setting TAMR_ES_ENABLED flag bool true no
hbase_config_path Path to HBase configuration in EMR root directory bucket. string "config/hbase/conf.dist/" no
hbase_namespace n/a string "tamr" no
hbase_number_of_regions Number of regions to create by default in HBase string "1000" no
hbase_number_of_salt_values Number of distinct salt values to be used for prefixing row keys in HBase tables. Must be >= hbase_number_of_regions string "1000" no
hbase_storage_mode Storage mode for HBase. Valid values: SHARED, DEDICATED string "SHARED" no
master_ebs_size The master EBS volume size, in gibibytes (GiB). string "" no
master_ebs_type Type of volumes to attach to the master nodes. Valid options are gp2, io1, standard and st1. string "" no
master_ebs_volumes_count Number of volumes to attach to the master nodes. string "" no
master_instance_type The EC2 instance type of the master nodes. string "" no
rds_pg_db_port The RDS postgres database port. number 5432 no
rds_pg_dbname RDS postgres database name. string "doit" no
rds_pg_username Master username for RDS postgres database instance. string "tamr" no
rendered_config_path If provided, the populated Tamr config will be output to this path. Include a file name (E.g. /path/to/config.yml). NOTE: Any required parent directories will be created automatically, and any existing file with the given name will be overwritten. string "" no
spark_driver_memory n/a string "5G" no
spark_emr_cluster_id Spark cluster ID. Value will not be used if deployment is spinning up ephemeral Spark clusters. string "" no
spark_executor_cores n/a number 2 no
spark_executor_instances n/a number 2 no
spark_executor_memory n/a string "8G" no
tamr_backup_emr_cluster_id ID of the static EMR cluster to run s3distcp on when backing up to or restoring from S3. string "" no
tamr_data_path Path in root directory bucket (bucket provided for tamr_bucket_name input) to write data to. string "tamr/unify-data" no
tamr_external_storage_providers Filesystem connection information for external storage providers. string "" no
tamr_file_based_hbase_backup_enabled Whether to backup contents of HBase root directory to backup path bool true no
tamr_spark_config_override A list of spark config overrides. If not set all jobs will run with the default spark settings. Used for setting job-by-job spark resource settings. string "" no
tamr_spark_properties_override JSON blob of spark properties to override. If not set, will use a default set of properties that should work for most use cases. string "" no
tamr_unify_backup_aws_role_based_access Set to true if Tamr should use EC2 instance profile (role-based) credentials instead of static credentials bool true no
tamr_unify_backup_es Defines whether or not to back up Elasticsearch bool false no
tamr_unify_backup_path Identifies the path for storing backup files string "tamr/backups" no

Outputs

Name Description
rendered Rendered Tamr config

References

This repo is based on:

Development

Generating Docs

Run make terraform/docs to generate the section of docs around terraform inputs, outputs and requirements.

Checkstyles

Run make lint, this will run terraform fmt, in addition to a few other checks to detect whitespace issues. NOTE: this requires having docker working on the machine running the test

Releasing new versions

  • Update version contained in VERSION
  • Document changes in CHANGELOG.md
  • Create a tag in github for the commit associated with the version

License

Apache 2 Licensed. See LICENSE for full details.