You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is desirable that none of the Docker images require knowledge of Minerva and AWS as then they can be completely generic and run standalone without modification. However, this may be an overly purist approach.
A moderate approach where the images have local/AWS modes of operation might make sense. The AWS mode of operation would have enhanced capabilities such as writing outputs to S3.
The major question is how to deal with orchestrating the steps in the batch import pipeline. If the scan phase identifies a fileset to process, how should we initiate the extraction phase to follow. Options are:
Write them to a file and upon completion of the scan process the file in lambda and launch many jobs. Very clean and composable into different workflows easily, but leads to increased overall import latency.
Launch the step function for extract directly. Low latency, but more difficult to compose into different workflows and also requires the Docker image to depend directly on the interfaces to the next steps in the pipeline.
Add items to an SQS queue. More complexity than launching the step function directly, but specifically made to handle the type of operation. Again is more difficult to compose into different workflows and requires the Docker image to depend directly on the interfaces to the next steps in the pipeline.
Some kind of opportunistic hybrid approach?
Writing the payloads between steps in the pipeline to S3 may be the best solution as
This allows a more sophisticated configuration of the job without relying on command line parameters and environment variables only which is a bit difficult
Payload limits for SQS/Step/Lambda are quite low
The text was updated successfully, but these errors were encountered:
It is desirable that none of the Docker images require knowledge of Minerva and AWS as then they can be completely generic and run standalone without modification. However, this may be an overly purist approach.
A moderate approach where the images have local/AWS modes of operation might make sense. The AWS mode of operation would have enhanced capabilities such as writing outputs to S3.
The major question is how to deal with orchestrating the steps in the batch import pipeline. If the scan phase identifies a fileset to process, how should we initiate the extraction phase to follow. Options are:
Writing the payloads between steps in the pipeline to S3 may be the best solution as
The text was updated successfully, but these errors were encountered: