Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(step-functions): add item reader path #30836

Closed
wants to merge 7 commits into from

Conversation

ChakshuGupta13
Copy link
Contributor

@ChakshuGupta13 ChakshuGupta13 commented Jul 12, 2024

Issue # (if applicable)

Fixes #29409

Reason for this change

  • For DistributedMap state of StepFunctions, IItemReader currently only allows S3 bucket as input source to be declared statically in CDK.
  • In other words, current CDK implementation only caters to static use-case where we know either bucket or bucketName (from which we can create IBucket) and pass it to IItemReader.
  • Whereas via AWS Console, if we create DistributedMap manually, then we can also convey S3 source dynamically using State Input / JsonPath.
  • In other words, for dynamic use-case, we will neither have bucket nor bucketName i.e. we only know state input variable which will convey bucketName e.g. $.bucketName.
  • So, if we want to use IItemReader for dynamic use case also, then we will:
    • (1) need to make bucket: IBucket optional (which will be breaking - how? e.g. if some dev is currently accessing bucket via reader.bucket then dev now needs to add check for undefined)
    • (2) then add another optional fields to convey state input variable names (e.g. $.bucketName, $.key, $.prefix)
  • Therefore, to avoid introducing breaking change, we can follow *Path convention of StepFunctions and introduce IItemReaderPath for dynamic use-case.
  • As IItemReaderPath is being introduced to allow using dynamic S3 source for DistributedMap, hence, we will also need to convey about it and its usage example via README.
  • Following proposal discussed briefly with @GavinZZ:

IItemReader-Proposal 2

Description of changes

  • Add IItemReaderPath interface (and its pros interface)
  • Add S3ObjectsItemReaderPath as one of many concrete classes of IItemReaderPath - this class also helps with unit-testing and integration-testing.
  • Modify index to export new constructs
  • Modify DistributedMap (and its props) to allow itemReaderPath? (which will be mutually exclusive with itemReader? and itemsPath?) and utilise it for render
  • Add when and how IItemReaderPath shall be used in README.

Description of how you validated changes

  • Via new unit-tests for DistributedMap (via yarn build+test)
  • Via new integration test for S3ObjectsItemReaderPath (with snapshot created)
    • Via yarn build --directory test/aws-stepfunctions/test && yarn integ test/aws-stepfunctions/test/integ.item-reader-path-s3-object.js && yarn integ-runner --update-on-failed
    • Verified expected step function execution result during snapshot creation
  • Previewed README changes.

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Jul 12, 2024
@aws-cdk-automation aws-cdk-automation requested a review from a team July 12, 2024 06:34
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@aws-cdk-automation aws-cdk-automation dismissed their stale review July 12, 2024 06:56

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@ChakshuGupta13 ChakshuGupta13 marked this pull request as ready for review July 12, 2024 08:13
@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Jul 12, 2024
Copy link
Contributor

@GavinZZ GavinZZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Just some questions to help me understand.

Comment on lines 607 to 615
/**
* State input shall include fields which are given to `S3ObjectItemReaderPath`:
* {
* ...
* bucketName: 'my-bucket',
* prefix: 'objectNamePrefix',
* ...
* }
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment needed and correct? Shouldn't it bucketNamePath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • About necessity of comment, yes, I thought this construct's usage is not very straightforward, hence, developers may need to do try-and-error with state input if this comment is not present, hence, to save time of developers by avoiding try-and-error, we can convey how state input shall look like.
  • bucketNamePath is a parameter of S3ObjectsItemReaderPath which contains key of state input field (here bucketName) which contains actual value ('my-bucket'), so, no, it shouldn't be bucketNamePath unless (bucketNamePath: '$. bucketNamePath' in S3ObjectsItemReaderPath), hence, yes, comment is correct. (Note: Same is verified via integration test implementation)
new sfn.S3ObjectsItemReaderPath({
    bucketNamePath: '$.bucketName', /** ~ bucketNamePath: '$.X' */
    ...
  }),

/**
 * State Input:
 * {
 *   bucketName: 'my-bucket', // ~ X: 'my-bucket'
 *   ...
 * }
 */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know. I also deployed this template and you're right. I think this is an important piece of information. I would like to request you to format the README file a bit clearer. Can we split this part into subheaders, one for static bucket and one for dynamic. In the dynamic add an (complete) example with the stack definition and state input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is following content for README okay?

  • Sub-header at level 4 heading were too small to make a difference and also text content of each heading shall be some construct like DistributedMap, hence, didn't utilise further sub-headers instead explained usage sequentially.
  • Ask: If this format seem alright, then will revise PR and in upcoming PRs will add examples for remaining input source types as they are not currently present for dynamic use-case.

DistributedMap supports various input source types to determine an array to iterate over:

  • JSON array from the state input
    • By default, DistributedMap assumes whole state input is an JSON array and iterates over it:
    [
      "item1",
      "item2"
    ]
    const distributedMap = new sfn.DistributedMap(this, 'DistributedMap');
    distributedMap.itemProcessor(new sfn.Pass(this, 'Pass'));
    • When input source is present at a specific path in state input, then itemsPath can be utilised to configure the iterator source.
    {
      "distributedMapItemList": [
        "item1",
        "item2"
      ]
    }
    const distributedMap = new sfn.DistributedMap(this, 'DistributedMap', {
      itemsPath: '$.distributedMapItemList',
    });
    distributedMap.itemProcessor(new sfn.Pass(this, 'Pass'));
  • Objects in a S3 bucket and optional prefix
    • When DistributedMap is required to iterate over objects stored in a S3 bucket, then if required parameters like bucket are known while creating StateMachine (statically or at compile time), then an object of S3ObjectsItemReader (implementing IItemReader) can be passed to itemReader to configure the iterator source.
    my-bucket
    |
    +--item1
    |
    +--otherItem
    |
    +--item2
    |
    ...
    import * as s3 from 'aws-cdk-lib/aws-s3';
    
    const bucket = new s3.Bucket(this, 'Bucket', {
      bucketName: 'my-bucket',
    });
    
    const distributedMap = new sfn.DistributedMap(this, 'DistributedMap', {
      itemReader: new sfn.S3ObjectsItemReader({
        bucket,
        prefix: 'item',
      }),
    });
    distributedMap.itemProcessor(new sfn.Pass(this, 'Pass'));
    • But if required parameters like bucketName are only known while starting execution of StateMachine (dynamically or at run-time) via state input, then an object of S3ObjectsItemReaderPath (implementing IItemReaderPath) can be passed to itemReaderPath to configure the iterator source.
    {
      "bucketName": "my-bucket",
      "prefix": "item"
    }
    const distributedMap = new sfn.DistributedMap(this, 'DistributedMap', {
      itemReaderPath: new sfn.S3ObjectsItemReaderPath({
        bucketNamePath: '$.bucketName',
        prefixPath: '$.prefix',
      }),
    });
    distributedMap.itemProcessor(new sfn.Pass(this, 'Pass'));
    • Both itemReader and itemReaderPath are mutually exclusive. For example, if bucket is known at compile time but prefix is only known at run-time, then both cannot be used simultaneously.
  • JSON array in a JSON file stored in S3
    • When DistributedMap is required to iterate over a JSON array stored in a JSON file in a S3 bucket, then if required parameters like bucket are known while creating StateMachine (statically or at compile time), then an object of S3JsonItemReader (implementing IItemReader) can be passed to itemReader to configure the iterator source.
    my-bucket
    |
    +--input.json
    |
    ...
    [
      "item1",
      "item2"
    ]
    import * as s3 from 'aws-cdk-lib/aws-s3';
    
    const bucket = new s3.Bucket(this, 'Bucket', {
      bucketName: 'my-bucket',
    });
    
    const distributedMap = new sfn.DistributedMap(this, 'DistributedMap', {
      itemReader: new sfn.S3JsonItemReader({
        bucket,
        key: 'input.json',
      }),
    });
    distributedMap.itemProcessor(new sfn.Pass(this, 'Pass'));
  • CSV file stored in S3
  • S3 inventory manifest stored in S3

Map states in Distributed mode also support writing results of the iterator to an S3 bucket and optional prefix. Use a ResultWriter object provided via the optional resultWriter property to configure which S3 location iterator results will be written. The default behavior id resultWriter is omitted is to use the state output payload. However, if the iterator results are larger than the 256 kb limit for Step Functions payloads then the State Machine will fail.

import * as s3 from 'aws-cdk-lib/aws-s3';

// create a bucket
const bucket = new s3.Bucket(this, 'Bucket');

const distributedMap = new sfn.DistributedMap(this, 'Distributed Map State', {
  resultWriter: new sfn.ResultWriter({
    bucket: bucket,
    prefix: 'my-prefix',
  })
});
distributedMap.itemProcessor(new sfn.Pass(this, 'Pass State'));

...

@GavinZZ
Copy link
Contributor

GavinZZ commented Jul 24, 2024

I think the code changes are good to go. I would like to request you to update the README clearer with subheaders and full examples. I'm happy to approve once it's updated.

*
* @default - No itemReaderPath
*/
readonly itemReaderPath?: IItemReaderPath;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am finding this approach very confusing to customers. Also we are duplicating every thing in 2 versions of almost the same classes, which means that in future for any updates to these classes, we are duplicating the work. I see that this may be some source of errors.

The main difference I see between the ItemReader and ItemReaderPath is for the first one, we automatically add the policy to access the S3 bucket, and the other one is we could not add this policy. We need to think about a better design approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the options is to deprecate the current ItemReader class, and add a new one, but I also do not like the idea of keep deprecating existing properties for new properties and ask customers to update their code.

Copy link
Contributor Author

@ChakshuGupta13 ChakshuGupta13 Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following is intentional, with IItemReaderPath, during compile time, we will not have information about bucket, hence, we can't add any permission during compile time, therefore, this action needs to be taken separately by caller who starts StateMachine execution with bucketName. Even, if we consider adding * policy statement by default here, then it will violate minimum access to resource guideline.

The main difference I see between the ItemReader and ItemReaderPath is for the first one, we automatically add the policy to access the S3 bucket, and the other one is we could not add this policy.

Following is precisely issue with current codebase that it is not extensible to allow changes and most of such options are already considered to decent extent (refer PR description) - and among those, this approach was least troublesome.

We need to think about a better design approach.
one of the options is to deprecate the current ItemReader class, and add a new one, but I also do not like the idea of keep deprecating existing properties for new properties and ask customers to update their code.

If you have any other alternatives, please feel free to share - happy to ponder over and utilise them. This is exactly why this PR is raised with minimum number of constructs i.e. to gain consensus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar duplicate parameters are also utilised in different places (reference) - although they have simple types, not full-fledged classes (or interfaces)

  /**
   * Error code used to represent this failure
   *
   * @default - No error code
   */
  readonly error?: string;

  /**
   * JsonPath expression to select part of the state to be the error to this state.
   *
   * You can also use an intrinsic function that returns a string to specify this property.
   * The allowed functions include States.Format, States.JsonToString, States.ArrayGetItem, States.Base64Encode, States.Base64Decode, States.Hash, and States.UUID.
   *
   * @default - No error path
   */
  readonly errorPath?: string;

I am finding this approach very confusing to customers. Also we are duplicating every thing in 2 versions of almost the same classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original proposal required breaking changes but avoided code duplication, hence also asked in ticket:
IItemReader-Proposal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moelasmar thanks for a second review and provide the feedback. I do agree that a number of resources are duplicate between IItemReader and IIteamReaderPath. This is not the best design but as mentioned earlier, we can't directly modify IItemReader due to breaking changes. Specifically IItemReader takes a required parameter bucket: IBucket and this doesn't work for dynamic parameter handling.

I agree this is not the best practice but there are many places in CDK that has similar issues. Would like you get your thought on the next step. One thing I can think of is to create a base interface that defines the shared parameters and IItemReader and IItemReaderPath to extend the interface and has its own definitions from there.

Copy link
Contributor Author

@ChakshuGupta13 ChakshuGupta13 Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to think about base interface as we discussed briefly (re:Slack & WorkLog) but then got stuck as to how will that be better useful? We will still need different implementations (both for static and dynamic) for each reader type (object, JSON, CSV, manifest etc.)

One thing I can think of is to create a base interface that defines the shared parameters and IItemReader and IItemReaderPath to extend the interface and has its own definitions from there.

So, can you please share if you foresee any benefit with base class (and if possible, can you please explain "how" like an example - that will help clarify few thoughts for me).


Secondly, I was thinking about examples like following:

 /**
   * Error code used to represent this failure
   *
   * @default - No error code
   */
  readonly error?: string;

  /**
   * JsonPath expression to select part of the state to be the error to this state.
   *
   * You can also use an intrinsic function that returns a string to specify this property.
   * The allowed functions include States.Format, States.JsonToString, States.ArrayGetItem, States.Base64Encode, States.Base64Decode, States.Hash, and States.UUID.
   *
   * @default - No error path
   */
  readonly errorPath?: string;

In such example, we always have two parameters (one for static and one for dynamic), so, now if IItemReader and its implementations have X number of static parameters (e.g. bucket, prefix, key etc.), then to have dynamic alternatives, we will have same number of parameters (e.g. bucketNamePath, prefixPath, keyPath etc.).

Also:

  • render method for both use-cases will differ (e.g. for dynamic Parameters will be suffixed with .$)
  • As mentioned earlier, for dynamic use-case, we cannot have policyStatements defined statically since required information is unavailable at that time, hence, that will also differ.

Therefore, only thing common between static and dynamic alternatives that can be de-duplicated is resource which is initialised to an Arn in constructor, hence, by making base class (or trying to generalise these classes), we can only avoid duplication of resource at max (since, we need do parameters like bucketNamePath, prefixPath, keyPath etc.)


Ideal solution would be to have classes like following:

export interface IItemReader {
  /** bucketName and bucketNamePath mutually exclusive*/
  readonly bucketName?: string;
  readonly bucketNamePath?: string;

  readonly resource: string;
  readonly maxItems?: number;
  render(): any;
}

export class S3ObjectsItemReader implements IItemReader {
  readonly bucketName?: string;
  readonly bucketNamePath?: string;

  /** prefix and prefixPath mutually exclusive*/
  readonly prefix?: string;
  readonly prefixPath?: string;

  readonly resource: string;
  readonly maxItems?: number;

  constructor(props...) {
    this.bucketName = props.bucketName;
    this.bucketNamePath = props.bucketNamePath;
    this.prefix = props.prefix;
    this.prefixPath = props.prefixPath;
    this.maxItems = props.maxItems;
    this.resource = Arn.format({
      region: '',
      account: '',
      partition: Aws.PARTITION,
      service: 'states',
      resource: 's3',
      resourceName: 'listObjectsV2',
      arnFormat: ArnFormat.COLON_RESOURCE_NAME,
    });
  }

  public render(): any {
    return FieldUtils.renderObject({
      Resource: this.resource,
      ...(this.maxItems && { ReaderConfig: { MaxItems: this.maxItems } }),
      Parameters: {
        ... (this.bucketName && { Bucket: this.bucketName } ),
        ... (this.bucketNamePath && { 'Bucket.$': this.bucketNamePath } ),
        ...(this.prefix && { Prefix: this.prefix }),
        ...(this.prefixPath && { 'Prefix.$': this.prefixPath }),
      },
    });
  }
}

This way we can even have partial static parameters and rest dynamic - example:

const distributedMap = new sfn.DistributedMap(this, 'DistributedMap', {
    itemReaderPath: new sfn.S3ObjectsItemReader({
      bucketName: 'my-bucket',
      prefixPath: '$.prefix',
    }),
  });
  distributedMap.itemProcessor(new sfn.Pass(this, 'Pass'));

As much as I like this freedom to choose which parameter needs to be dynamic, this will need breaking changes.

EDIT: Forgot that validations (i.e. mutually exclusiveness above) can only be done for States - not state-less (node-less) constructs like implementations of IItemReader, so, above solution may not be feasible also (unless these are made actual Construct which poses different problems altogether like assigning unique logical IDs etc.) - leaving us with two different classes only.

Copy link
Contributor Author

@ChakshuGupta13 ChakshuGupta13 Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moelasmar @GavinZZ
my colleague discovered following way with which we can provide dynamic parameters (except for Bucket because it is currently of type IBucket instead of string):

const myBucket = new Bucket(stack, 'MyBucket', ...);
const distributedMap = new sfn.DistributedMap(this, 'DistributedMap', {
    itemReader: new sfn.S3ObjectsItemReader({
      bucket: myBucket,
      prefix: JsonPath.stringAt('$.prefix'),
    }),
  });
  distributedMap.itemProcessor(new sfn.Pass(this, 'Pass'));

This code snippet produces following cdk.out template:

...
"Parameters\":{\"Bucket\":\"",
       {
        "Ref": "MyBucket..."
       },
       "\",\"Prefix.$\":\"$.prefix\"}}}}}"
      ]
...

Not sure why this works but since it works, this PR may not be needed for the said half issue (to be able to pass dynamic file name) but for other half issue i.e. how to also allow Bucket to be dynamic? For that, the ideal would be breaking change where bucket: IBucket is changed to bucket: string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, but we still need to fix the issue where it only takes IBucket. I don't find much values actually to move the common parameters into a base interface and have IItemReader and IItemReaderPath to extend it.

@moelasmar would you mind take another look and provide some feedback if you have any ideas on a non-breaking change but don't have to duplicate the same properties.

@@ -0,0 +1,118 @@
import { Arn, ArnFormat, Aws } from '../../../../core';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you did not do similar thing for S3ManifestItemReader, S3CsvItemReader, and S3JsonItemReader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be taken in follow up PRs for following reasons:

  • Confirmation is first needed whether these type of changes are acceptable.
  • It is not good to raise one huge PR with all changes when that huge PR can be easily broken down into smaller chunks.

});
map.itemProcessor(new stepfunctions.Pass(stack, 'Pass State'));

//THEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also validate that there is no policy automatically added for this bucket.

Copy link
Contributor Author

@ChakshuGupta13 ChakshuGupta13 Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? There is no policyStatement parameter with IItemReaderPath unlike IItemReader.

By this logic, we will start checking for absence of everything everywhere just because we know that something similar existed somewhere else.

Second, no real bucket exists for this test - we are just passing state input path which will convey bucketName dynamically, hence, checking whether permissions are added will not be possible anyway.

@GavinZZ
Copy link
Contributor

GavinZZ commented Aug 8, 2024

Pinging @moelasmar for a second review for any blocking feedbacks.

Copy link
Contributor

@moelasmar moelasmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChakshuGupta13 .. sorry for the late reply. My recommendation is to update IItemReader interface in packages/aws-cdk-lib/aws-stepfunctions/lib/states/distributed-map/item-reader.ts to make the bucket property as optional, and you can add the new property bucketNamePath as another optional property, and you can add the required validations to make sure that either one of them is used.

As based on this link, changing a property from required to optional is the type of accepted weakening and it is not a breaking change

@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Sep 8, 2024
@ChakshuGupta13
Copy link
Contributor Author

@ChakshuGupta13 .. sorry for the late reply. My recommendation is to update IItemReader interface in packages/aws-cdk-lib/aws-stepfunctions/lib/states/distributed-map/item-reader.ts to make the bucket property as optional, and you can add the new property bucketNamePath as another optional property, and you can add the required validations to make sure that either one of them is used.

As based on this link, changing a property from required to optional is the type of accepted weakening and it is not a breaking change

Thanks for checking @moelasmar.
First, definitely can try with having two parameters bucket and bucketPath (both being mutually exclusive to each other) as per suggestion.

Before moving forward, I would also like to bring your attention to following comment:

@moelasmar @GavinZZ my colleague discovered following way with which we can provide dynamic parameters (except for Bucket because it is currently of type IBucket instead of string):

const myBucket = new Bucket(stack, 'MyBucket', ...);
const distributedMap = new sfn.DistributedMap(this, 'DistributedMap', {
    itemReader: new sfn.S3ObjectsItemReader({
      bucket: myBucket,
      prefix: JsonPath.stringAt('$.prefix'),
    }),
  });
  distributedMap.itemProcessor(new sfn.Pass(this, 'Pass'));

This code snippet produces following cdk.out template:

...
"Parameters\":{\"Bucket\":\"",
       {
        "Ref": "MyBucket..."
       },
       "\",\"Prefix.$\":\"$.prefix\"}}}}}"
      ]
...

Not sure why this works but since it works, this PR may not be needed for the said half issue (to be able to pass dynamic file name) but for other half issue i.e. how to also allow Bucket to be dynamic? For that, the ideal would be breaking change where bucket: IBucket is changed to bucket: string.

  • From this comment, we can observe that if we can just change type of bucket from IBucket to string, then, we may be able to directly pass JsonPath to it for dynamic use-case and literal string value for static use-case.
  • Only problem is that this will be a breaking change - but in long-term, this shall be the state of constructs.
  • So, if we currently introduce another parameter bucketPath, then during major version change of CDK, we may need to revisit this code section to simplify.
  • Hence, is it better if this breaking change can be made with the help of feature flag until next major version change of CDK (at which point, we can remove feature flag and keep the changes).

@moelasmar
Copy link
Contributor

thanks @ChakshuGupta13 for your reply.

I agree on adding a new property bucketPath to be of string type, but I do not see a value from adding a feature flag, as the final result from customer POV will be that there is 2 properties that customer can use, and customer will find themselves to also enable/disable this flag to use one of these properties which will not be a good experience.

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

Chakshu Gupta added 6 commits September 30, 2024 13:56
**Why are these changes required?**
- For `DistributedMap` state of StepFunctions, `IItemReader` currently only allows S3 bucket as input source to be declared statically in CDK.
- In other words, current CDK implementation only caters to static use-case where we know either `bucket` or `bucketName` (from which we can create `IBucket`) and pass it to `IItemReader`.
- Whereas via AWS Console, if we create `DistributedMap` manually, then we can also convey S3 source dynamically using State Input / JsonPath.
- In other words, for dynamic use-case, we will neither have `bucket` nor `bucketName` i.e. we only know state input variable which will convey `bucketName` e.g. `$.bucketName`.
- So, if we want to use `IItemReader` for dynamic use case also, then we will:
  - (1) need to make `bucket: IBucket` optional (which will be breaking - how? e.g. if some dev is currently accessing `bucket` via `reader.bucket` then dev now needs to add check for `undefined`)
  - (2) then add another optional fields to convey state input variable names (e.g. $.bucketName $.key $.prefix)
- Therefore, to avoid introducing breaking change, we can follow `*Path` convention of StepFunctions and introduce `IItemReaderPath` for dynamic use-case. (closes #29409)

**What changes are being made?**
- Add `IItemReaderPath` interface (and its pros interface)
- Add `S3ObjectsItemReaderPath` as one of many concrete classes of `IItemReaderPath` - this class also helps with unit-testing and integration-testing.
- Modify `index` to export new constructs
- Modify `DistributedMap` (and its props) to allow `itemReaderPath?` (which will be mutually exclusive with `itemReader?` and `itemsPath?`) and utilise it for render

**How are these changes tested?**
- Via new unit-tests for `DistributedMap` (via `yarn build+test`)
- Via new integration test for `S3ObjectsItemReaderPath` (with snapshot created)
  - Via `yarn build --directory test/aws-stepfunctions/test && yarn integ test/aws-stepfunctions/test/integ.item-reader-path-s3-object.js && yarn integ-runner --update-on-failed`
  - Verified expected step function execution result during snapshot creation
- `IItemReaderPath` is being introduced to allow using dynamic S3 source for `DistributedMap`, hence, we will need to convey about it and its usage example via README.
- Add when and how `IItemReaderPath` shall be used.
- Previewed README changes.
- In the example of `S3ObjectsItemReaderPath`, need to use `prefixPath` instead of `prefix`.
Following errors are detected by CodeBuild logs, hence, changes resolve these.
```
aws-cdk-lib.aws_stepfunctions-README-L589.ts:34:1 - error TS2304: Cannot find name 'distributedMap'.

34 distributedMap.itemProcessor(new sfn.Pass(this, 'Pass State'));
   ~~~~~~~~~~~~~~
aws-cdk-lib.aws_stepfunctions-README-L589.ts:46:3 - error TS2739: Type 'S3ObjectsItemReaderPath' is missing the following properties from type 'IItemReader': bucket, providePolicyStatements

46   itemReader: new sfn.S3ObjectsItemReaderPath({
     ~~~~~~~~~~
aws-cdk-lib.aws_stepfunctions-README-L589.ts:48:5 - error TS2353: Object literal may only specify known properties, and 'prefix' does not exist in type 'S3ObjectsItemReaderPathProps'.

48     prefix: '$.prefix',
       ~~~~~~
aws-cdk-lib.aws_stepfunctions-README-L589.ts:55:1 - error TS2304: Cannot find name 'distributedMap'.

55 distributedMap.itemProcessor(new sfn.Pass(this, 'Pass State'));
   ~~~~~~~~~~~~~~
[jsii-rosetta] [WARN] 4 diagnostics from assemblies with 'strict' mode on (and 44 more non-strict diagnostics)
```
…nation

**Reason for this change**
- current structure for distributed map's input source content does not convey properly how to configure input source as per requirements (also it lacks examples).

**Description of changes**
- re-structure distributed map's input source explanation to convey how to configure input source as per requirements with examples.
- nit: also addressed unrelated issue with README

**Description of how you validated changes**
- `yarn build+extract`
**Why**
- the first line of comment is used as the description shown in the properties table, hence, repeating the property name does not convey much

**What**
- replace first line of comment from property name to description of property

**Tested by**
- `yarn build`
@mergify mergify bot dismissed moelasmar’s stale review September 30, 2024 08:27

Pull request has been modified.

@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Sep 30, 2024
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: fdf3490
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@ChakshuGupta13
Copy link
Contributor Author

@GavinZZ @moelasmar Created separate PR #31619 since strategy is changed. Can you please review it?
Will close this PR later.

Copy link

github-actions bot commented Oct 3, 2024

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 3, 2024
@ChakshuGupta13 ChakshuGupta13 deleted the issue_29409 branch October 14, 2024 09:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2 pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

state machine : Dynamic passing of bucket and key to distributed map
4 participants