Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for generating and inserting results from object storage such as amazon S3 #27

Open
azimov opened this issue Apr 5, 2023 · 0 comments

Comments

@azimov
Copy link
Collaborator

azimov commented Apr 5, 2023

Currently, this package only directly supports uploads of files from a directory structure.
However, this is limiting for many projects because it may be significantly faster to asynchronously produce results and export them to simple object stores such as Amazon S3.

Furthemore, many tasks that execute are likely mainly database intensive and not cpu intensive. Requiring EC2 nodes or other services that write to a disk is likely an expensive solution when results can easily be unloaded from Databases into object stores in an async manner.

Proposals:

  • Define interfaces for import of files from S3 buckets/google cloudstore/
  • Support a load table solution where results can be imported into load tables in databases in a threadsafe manner:
  • Upload csv objects then copy them to main table one at a time so any race conditions don't lock up tables
  • Support creating manifests that can be transfered. E.g. results are generated by some analytics package and a json file is created listing the bucket/object store and file reference as well as the result model spec
  • Support a simple table back end (in lue of a message queue/broker) that stores and logs the state of the results insert
  • Make a simple Plumber API that lets you initiate an upload from a given manifest (hashed entries to prevent multiple requests with identical uploads)
  • Cleanup/Garbage collection step: Delete objects from object stores when inserts are successful

Potential Issues:

  • Storage of keys for buckets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant