Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(glue-alpha): glue L2 alpha construct #30833

Open
wants to merge 72 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
38bbbce
Propagate initia glue l2 jobl refactor from natalie-white-aws/aws-cdk…
natalie-white-aws Feb 23, 2024
5b9a16c
Merge branch 'aws:main' into main
natalie-white-aws Feb 23, 2024
2632f78
added Ray jobs
askarserikov Feb 29, 2024
c67cd81
Create raw CDK assets for Python Shell Jobs
chrisw-devops Mar 5, 2024
273d603
Fixed broken path.join
chrisw-devops Mar 5, 2024
f66e0e0
Updated Python Shell Jobs integration test output to match updated in…
chrisw-devops Mar 11, 2024
e844449
Updated Python Shell Job unit tests to validate default values
chrisw-devops Mar 11, 2024
8b367fa
PySpark Streaming job
mjanardhan Apr 9, 2024
d19f5cc
PySpark Streaming job
mjanardhan Apr 9, 2024
0809778
Scala Spark Streaming Job class
mjanardhan Apr 9, 2024
6118d80
PySpark and Scala Flex ETL jobs L2 constructs
pras-b Apr 12, 2024
6ed7021
Modifications to comments
pras-b Apr 12, 2024
087f411
Scala Spark Streaming Job class
mjanardhan Apr 15, 2024
23a76a9
Streaming Jobs - integration tests
mjanardhan Apr 15, 2024
ac4b457
Streaming Jobs
mjanardhan Apr 15, 2024
9d936dc
Streaming jobs CDK L2
mjanardhan Apr 15, 2024
b28dc55
Python Streaming Jobs - Integration test updates
mjanardhan Apr 15, 2024
8afbb59
Scala Streaming Jobs - Integration test updates
mjanardhan Apr 15, 2024
ffca357
Python Streaming Job updates
mjanardhan Apr 16, 2024
1da5519
Modifications to scala & pyspark flex etl jobs based on PR review
pras-b Apr 18, 2024
1762b46
Modified the job definition to add logging and metrics. Added unit an…
askarserikov Apr 22, 2024
a234b6b
Initial Commit for Workflow Triggers
dkovvuri Apr 30, 2024
3ff57ce
ETL Jobs and tests
mjanardhan May 28, 2024
f9675bc
ETL Jobs and tests
mjanardhan May 28, 2024
06c7d0a
ETL Jobs and tests
mjanardhan May 28, 2024
9457e35
ETL Jobs and tests
mjanardhan May 28, 2024
0c412cb
Update default python version
mjanardhan May 28, 2024
5413e85
ETL jobs & tests
mjanardhan May 28, 2024
121b21e
Merge pull request #6 from mjanardhan/etl-jobs
natalie-white-aws Jun 18, 2024
72818b0
Merge pull request #5 from dkovvuri/main
natalie-white-aws Jun 18, 2024
4d8a4f9
Merge pull request #4 from mjanardhan/ray-jobs
natalie-white-aws Jun 18, 2024
053ab32
Merge pull request #3 from mjanardhan/streaming-jobs
natalie-white-aws Jun 18, 2024
c9b0ae9
Merge branch 'main' into python-shell-jobs
natalie-white-aws Jun 18, 2024
2a26c5a
Merge pull request #1 from mjanardhan/python-shell-jobs
natalie-white-aws Jun 18, 2024
cfb13a5
Merge branch 'main' into prasbal-spark-flex
natalie-white-aws Jun 18, 2024
f976c88
Merge pull request #2 from mjanardhan/prasbal-spark-flex
natalie-white-aws Jun 18, 2024
5fd8569
Updates based on glue team feedback
mjanardhan Jun 25, 2024
c33b0b9
Updated tests and results
mjanardhan Jun 25, 2024
029c52a
Rename flex jobs per naming convention
mjanardhan Jun 25, 2024
75a4e6e
Fix s3 path specified in --spark-event-logs-path and update glue version
mjanardhan Jun 26, 2024
1cf6d29
Updated default glue version to v2
mjanardhan Jun 26, 2024
28f3b27
Merge pull request #7 from mjanardhan/glue_job_updates
natalie-white-aws Jun 26, 2024
d24c24a
Delete Job Legacy classes, change default WorkerType back to G_1X
natalie-white-aws Jun 27, 2024
d1f3dfc
Run tests and snapshot output
mjanardhan Jun 27, 2024
65237a1
Merge pull request #8 from mjanardhan/run_tests
mjanardhan Jun 27, 2024
593e877
Refactor coninuous logging default enabled plus unit tests in pyspark…
natalie-white-aws Jun 29, 2024
081c94b
Merge pull request #9 from mjanardhan/refactorLogging
natalie-white-aws Jun 29, 2024
7c629c2
Merging upstream CDK Core commits
natalie-white-aws Jul 11, 2024
a75ac69
Delete legacy Glue Job classes and tests post-merge
natalie-white-aws Jul 11, 2024
cea1cef
Merge pull request #10 from mjanardhan/pull-from-cdk-main
mjanardhan Jul 11, 2024
a17a4df
Updated snapshots for the jobs integ tests
mjanardhan Jul 11, 2024
b30055e
Merge pull request #11 from mjanardhan/update_snapshot
natalie-white-aws Jul 11, 2024
28c97fd
Final README update
natalie-white-aws Jul 11, 2024
316c670
Resolve README linter issues
natalie-white-aws Jul 12, 2024
6343ad2
Increase unit test coverage, especially for pyspark etl jobs and ray …
natalie-white-aws Jul 15, 2024
f4b2315
Increase unit test coverage for python shell and pyspark streaming jobs
natalie-white-aws Jul 15, 2024
7a74777
Added unit test cases to the scala jobs
pras-b Jul 18, 2024
e051e05
Merge pull request #13 from mjanardhan/scala-unit-test
natalie-white-aws Jul 18, 2024
b601916
fixed the unit test for scala etl and flex etl
pras-b Jul 18, 2024
cdba008
Fixing scala etl and pyspark streaming unit tests
natalie-white-aws Jul 18, 2024
df4b1d0
Merge pull request #14 from mjanardhan/scala-unit-test
mjanardhan Jul 23, 2024
3943206
WorkerType and numberofWorkers defaults are enforced when not set
mjanardhan Jul 23, 2024
c077b11
Fix tests
mjanardhan Jul 23, 2024
79339c4
Merge pull request #15 from mjanardhan/fix-ray-integ-tests
natalie-white-aws Jul 23, 2024
dbbaabc
Updated snapshots
mjanardhan Jul 24, 2024
cffa1b7
Merge pull request #16 from mjanardhan/integtests_snapshots
mjanardhan Jul 24, 2024
7077f98
Resolve glue alpha README compilation errors
natalie-white-aws Jul 29, 2024
f2b4a97
Merge branch 'glue-alpha-refacor' of https://github.com/mjanardhan/aw…
natalie-white-aws Jul 29, 2024
9f1edd6
Resolve additional README compilation issues
natalie-white-aws Jul 30, 2024
f4886a7
Remove code examples from README and reference unit tests for examples
natalie-white-aws Aug 6, 2024
1c31609
Merge branch 'main' into glue-alpha-refacor
natalie-white-aws Aug 7, 2024
84c6c55
Merge branch 'main' into glue-alpha-refacor
natalie-white-aws Oct 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
790 changes: 226 additions & 564 deletions packages/@aws-cdk/aws-glue-alpha/README.md

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions packages/@aws-cdk/aws-glue-alpha/awslint.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@
"docs-public-apis:@aws-cdk/aws-glue-alpha.ITable",
"docs-public-apis:@aws-cdk/aws-glue-alpha.ITable.tableArn",
"docs-public-apis:@aws-cdk/aws-glue-alpha.ITable.tableName",
"props-default-doc:@aws-cdk/aws-glue-alpha.PythonRayExecutableProps.runtime",
"props-default-doc:@aws-cdk/aws-glue-alpha.PythonShellExecutableProps.runtime",
"props-default-doc:@aws-cdk/aws-glue-alpha.PythonSparkJobExecutableProps.runtime",
"docs-public-apis:@aws-cdk/aws-glue-alpha.S3TableProps",
"props-default-doc:@aws-cdk/aws-glue-alpha.ScalaJobExecutableProps.runtime",
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableAttributes",
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableAttributes.tableArn",
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableAttributes.tableName",
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableBaseProps",
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableProps"
"docs-public-apis:@aws-cdk/aws-glue-alpha.TableProps",
"docs-public-apis:@aws-cdk/aws-glue-alpha.PredicateLogical",
"no-unused-type:@aws-cdk/aws-glue-alpha.ExecutionClass",
"no-unused-type:@aws-cdk/aws-glue-alpha.JobLanguage",
"no-unused-type:@aws-cdk/aws-glue-alpha.JobType"
]
}
308 changes: 308 additions & 0 deletions packages/@aws-cdk/aws-glue-alpha/lib/constants.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,308 @@
/**
* The type of predefined worker that is allocated when a job runs.
*
* If you need to use a WorkerType that doesn't exist as a static member, you
* can instantiate a `WorkerType` object, e.g: `WorkerType.of('other type')`
*/
export enum WorkerType {
/**
* Standard Worker Type
* 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.
*/
STANDARD = 'Standard',

/**
* G.1X Worker Type
* 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. Suitable for memory-intensive jobs.
*/
G_1X = 'G.1X',

/**
* G.2X Worker Type
* 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. Suitable for memory-intensive jobs.
*/
G_2X = 'G.2X',

/**
* G.4X Worker Type
* 4 DPU (16 vCPU, 64 GB of memory, 256 GB disk), and provides 1 executor per worker.
* We recommend this worker type for jobs whose workloads contain your most demanding transforms,
* aggregations, joins, and queries. This worker type is available only for AWS Glue version 3.0 or later jobs.
*/
G_4X = 'G.4X',

/**
* G.8X Worker Type
* 8 DPU (32 vCPU, 128 GB of memory, 512 GB disk), and provides 1 executor per worker. We recommend this worker
* type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries.
* This worker type is available only for AWS Glue version 3.0 or later jobs.
*/
G_8X = 'G.8X',

/**
* G.025X Worker Type
* 0.25 DPU (2 vCPU, 4 GB of memory, 64 GB disk), and provides 1 executor per worker. Suitable for low volume streaming jobs.
*/
G_025X = 'G.025X',

/**
* Z.2X Worker Type
*/
Z_2X = 'Z.2X',
}

/**
* The number of workers of a defined workerType that are allocated when a job runs.
*
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html
*/

/**
* Job states emitted by Glue to CloudWatch Events.
*
* @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types for more information.
*/
export enum JobState {
/**
* State indicating job run succeeded
*/
SUCCEEDED = 'SUCCEEDED',

/**
* State indicating job run failed
*/
FAILED = 'FAILED',

/**
* State indicating job run timed out
*/
TIMEOUT = 'TIMEOUT',

/**
* State indicating job is starting
*/
STARTING = 'STARTING',

/**
* State indicating job is running
*/
RUNNING = 'RUNNING',

/**
* State indicating job is stopping
*/
STOPPING = 'STOPPING',

/**
* State indicating job stopped
*/
STOPPED = 'STOPPED',
}

/**
* The Glue CloudWatch metric type.
*
* @see https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html
*/
export enum MetricType {
/**
* A value at a point in time.
*/
GAUGE = 'gauge',

/**
* An aggregate number.
*/
COUNT = 'count',
}

/**
* The ExecutionClass whether the job is run with a standard or flexible execution class.
*
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html#aws-glue-api-jobs-job-Job
* @see https://docs.aws.amazon.com/glue/latest/dg/add-job.html
*/
export enum ExecutionClass {
/**
* The flexible execution class is appropriate for time-insensitive jobs whose start
* and completion times may vary.
*/
FLEX = 'FLEX',

/**
* The standard execution class is ideal for time-sensitive workloads that require fast job
* startup and dedicated resources.
*/
STANDARD = 'STANDARD',
}

/**
* AWS Glue version determines the versions of Apache Spark and Python that are available to the job.
*
* @see https://docs.aws.amazon.com/glue/latest/dg/add-job.html.
*/
export enum GlueVersion {
/**
* Glue version using Spark 2.2.1 and Python 2.7
*/
V0_9 = '0.9',

/**
* Glue version using Spark 2.4.3, Python 2.7 and Python 3.6
*/
V1_0 = '1.0',

/**
* Glue version using Spark 2.4.3 and Python 3.7
*/
V2_0 = '2.0',

/**
* Glue version using Spark 3.1.1 and Python 3.7
*/
V3_0 = '3.0',

/**
* Glue version using Spark 3.3.0 and Python 3.10
*/
V4_0 = '4.0',

}

/**
* Runtime language of the Glue job
*/
export enum JobLanguage {
/**
* Scala
*/
SCALA = 'scala',

/**
* Python
*/
PYTHON = 'python',
}

/**
* Python version
*/
export enum PythonVersion {
/**
* Python 2 (the exact version depends on GlueVersion and JobCommand used)
*/
TWO = '2',

/**
* Python 3 (the exact version depends on GlueVersion and JobCommand used)
*/
THREE = '3',

/**
* Python 3.9 (the exact version depends on GlueVersion and JobCommand used)
*/
THREE_NINE = '3.9',

}

/**
* AWS Glue runtime determines the runtime engine of the job.
*
*/
export enum Runtime {
/**
* Runtime for a Glue for Ray 2.4.
*/
RAY_TWO_FOUR = 'Ray2.4',
}

/**
* The job type.
*
* If you need to use a JobType that doesn't exist as a static member, you
* can instantiate a `JobType` object, e.g: `JobType.of('other name')`.
*/
export enum JobType {
/**
* Command for running a Glue Spark job.
*/
ETL = 'glueetl',

/**
* Command for running a Glue Spark streaming job.
*/
STREAMING = 'gluestreaming',

/**
* Command for running a Glue python shell job.
*/
PYTHON_SHELL = 'pythonshell',

/**
* Command for running a Glue Ray job.
*/
RAY = 'glueray',

}

/**
* The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory.
*/
export enum MaxCapacity {

/**
* DPU value of 1/16th
*/
DPU_1_16TH = 0.0625,

/**
* DPU value of 1
*/
DPU_1 = 1,
}

/*
* Represents the logical operator for combining multiple conditions in the Glue Trigger API.
*/
export enum PredicateLogical {
/**
* All conditions must be true for the predicate to be true.
*/
AND = 'AND',

/**
* At least one condition must be true for the predicate to be true.
*/
ANY = 'ANY',
}

/**
* Represents the logical operator for evaluating a single condition in the Glue Trigger API.
*/
export enum ConditionLogicalOperator {
/** The condition is true if the values are equal. */
EQUALS = 'EQUALS',
}

/**
* Represents the state of a crawler for a condition in the Glue Trigger API.
*/
export enum CrawlerState {
/** The crawler is currently running. */
RUNNING = 'RUNNING',

/** The crawler is in the process of being cancelled. */
CANCELLING = 'CANCELLING',

/** The crawler has been cancelled. */
CANCELLED = 'CANCELLED',

/** The crawler has completed its operation successfully. */
SUCCEEDED = 'SUCCEEDED',

/** The crawler has failed to complete its operation. */
FAILED = 'FAILED',

/** The crawler encountered an error during its operation. */
ERROR = 'ERROR',
}
15 changes: 13 additions & 2 deletions packages/@aws-cdk/aws-glue-alpha/lib/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,22 @@ export * from './data-format';
export * from './data-quality-ruleset';
export * from './database';
export * from './external-table';
export * from './job';
export * from './job-executable';
export * from './s3-table';
export * from './schema';
export * from './security-configuration';
export * from './storage-parameter';
export * from './constants';
export * from './jobs/job';
export * from './jobs/pyspark-etl-job';
export * from './jobs/pyspark-flex-etl-job';
export * from './jobs/pyspark-streaming-job';
export * from './jobs/python-shell-job';
export * from './jobs/ray-job';
export * from './jobs/scala-spark-etl-job';
export * from './jobs/scala-spark-flex-etl-job';
export * from './jobs/scala-spark-streaming-job';
export * from './jobs/spark-ui-utils';
export * from './table-base';
export * from './table-deprecated';
export * from './triggers/workflow';
export * from './triggers/trigger-options';
Loading