Orchestrate Batch jobs without onward call #13

dpwrussell · 2018-08-03T03:57:10Z

It is desirable that none of the Docker images require knowledge of Minerva and AWS as then they can be completely generic and run standalone without modification. However, this may be an overly purist approach.

A moderate approach where the images have local/AWS modes of operation might make sense. The AWS mode of operation would have enhanced capabilities such as writing outputs to S3.

The major question is how to deal with orchestrating the steps in the batch import pipeline. If the scan phase identifies a fileset to process, how should we initiate the extraction phase to follow. Options are:

Write them to a file and upon completion of the scan process the file in lambda and launch many jobs. Very clean and composable into different workflows easily, but leads to increased overall import latency.
Launch the step function for extract directly. Low latency, but more difficult to compose into different workflows and also requires the Docker image to depend directly on the interfaces to the next steps in the pipeline.
Add items to an SQS queue. More complexity than launching the step function directly, but specifically made to handle the type of operation. Again is more difficult to compose into different workflows and requires the Docker image to depend directly on the interfaces to the next steps in the pipeline.
Some kind of opportunistic hybrid approach?

Writing the payloads between steps in the pipeline to S3 may be the best solution as

This allows a more sophisticated configuration of the job without relying on command line parameters and environment variables only which is a bit difficult
Payload limits for SQS/Step/Lambda are quite low

dpwrussell mentioned this issue Aug 24, 2018

General System Upgrade Plan #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orchestrate Batch jobs without onward call #13

Orchestrate Batch jobs without onward call #13

dpwrussell commented Aug 3, 2018 •

edited

Loading

Orchestrate Batch jobs without onward call #13

Orchestrate Batch jobs without onward call #13

Comments

dpwrussell commented Aug 3, 2018 • edited Loading

dpwrussell commented Aug 3, 2018 •

edited

Loading