Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchestrate Batch jobs without onward call #13

Open
dpwrussell opened this issue Aug 3, 2018 · 0 comments
Open

Orchestrate Batch jobs without onward call #13

dpwrussell opened this issue Aug 3, 2018 · 0 comments

Comments

@dpwrussell
Copy link
Contributor

dpwrussell commented Aug 3, 2018

It is desirable that none of the Docker images require knowledge of Minerva and AWS as then they can be completely generic and run standalone without modification. However, this may be an overly purist approach.

A moderate approach where the images have local/AWS modes of operation might make sense. The AWS mode of operation would have enhanced capabilities such as writing outputs to S3.

The major question is how to deal with orchestrating the steps in the batch import pipeline. If the scan phase identifies a fileset to process, how should we initiate the extraction phase to follow. Options are:

  • Write them to a file and upon completion of the scan process the file in lambda and launch many jobs. Very clean and composable into different workflows easily, but leads to increased overall import latency.
  • Launch the step function for extract directly. Low latency, but more difficult to compose into different workflows and also requires the Docker image to depend directly on the interfaces to the next steps in the pipeline.
  • Add items to an SQS queue. More complexity than launching the step function directly, but specifically made to handle the type of operation. Again is more difficult to compose into different workflows and requires the Docker image to depend directly on the interfaces to the next steps in the pipeline.
  • Some kind of opportunistic hybrid approach?

Writing the payloads between steps in the pipeline to S3 may be the best solution as

  • This allows a more sophisticated configuration of the job without relying on command line parameters and environment variables only which is a bit difficult
  • Payload limits for SQS/Step/Lambda are quite low
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant