Implement custom storage for orgs #2093

tw4l · 2024-09-30T20:41:53Z

Fixes #578

Adds

API endpoints for adding and deleting custom storages on organizations
API endpoints for updating primary and/or replica storage for an org
API endpoint to check on progress of background job (currently, only bucket copy jobs are supported)
Automated hooks to copy an organization's files from previous s3 bucket to new and update files in database when primary storage is changed
Automated hooks to replicate content from primary storage to new replica location and update files in the database when a replica location is set on an org
New pylint disable comments on many of the backup modules so that linting passes
Admin documentation for adding, removing, and configuring custom storage locations on an organization

Notes

Currently, no delete operations happen for a a bucket previously used as a primary or replica location that is unset. Files are copied to the new bucket to ensure there are no usability issues moving forward in the app, but the files are not automatically deleted from the source after the copy job. We could add that but I wonder if it's safer, especially in the early days of testing, to perform that cleanup manually as desired.

Once we're comfortable, we can change the rclone command in the copy_job.yaml background job template from copy to move if we want it to automatically clean up files from the source location on completion. Since the same template is used for copying files from an old primary storage to a new primary storage as well as to replicate from primary storage to a new replica location, we'd want to make sure the latter still uses copy so as not to delete files from the primary storage location.

TODO

Documentation
Look into progress indicator for copy jobs

- Set access_endpoint_url to the endpoint url with bucket so that we can generate a presigned URL as expected - Make adding bucket in verify_storage_upload a backup routine after first exception is raised

Previously, files in a default bucket were prefixed with the oid but were not in custom storages. This commit removes that distinction to aid in copying files in buckets, removing the need for unnecessary filepath manipulation. The CopyBucketJob now only copies an organization's directory rather than the entire bucket to prevent accidentally copying another organization's data.

Creating a bucket in the verification stage for adding custom storages if it didn't exist was useful for testing but is an anti-pattern for production, so we remove it here.

tw4l force-pushed the issue-578-custom-storage branch from df5b6e9 to 10ab1b6 Compare October 1, 2024 15:24

tw4l mentioned this pull request Oct 2, 2024

Move org deletion to background job with access to backend ops classes #2098

Merged

tw4l force-pushed the issue-578-custom-storage branch 9 times, most recently from f226271 to eb065b6 Compare October 17, 2024 15:40

This was referenced Oct 17, 2024

Add superadmin UI for configuring org storage #2113

Open

Add nightly integration tests for modifying org storage #2114

Open

tw4l marked this pull request as ready for review October 17, 2024 16:34

tw4l requested a review from ikreymer October 17, 2024 16:34

tw4l added 15 commits December 3, 2024 16:43

Add back custom storage endpoints

1437561

Flush out tests for setting custom storage

041f697

Fix test issue with bucket not existing for now

72decac

Add additional tests

bd307e1

Fix custom storage so it works as expected

73bc551

- Set access_endpoint_url to the endpoint url with bucket so that we can generate a presigned URL as expected - Make adding bucket in verify_storage_upload a backup routine after first exception is raised

Actually unset custom replica storage before deleting

1e09ccc

Add TODO where custom storage deletion is failing

c80e74c

Fix check for whether storage label is in use

8a7640b

Remove todo on endpoint that's fine

b6514bb

Add todos re: tasks necessary to change storage

0ce2dc3

Check that no crawls are running before updating storage

730fc42

Start adding post-storage update logic

25788c2

WIP: Add background job to copy old s3 bucket to new

d684865

WIP: Start adding logic to handle replica location updates

448e237

Add additional note

71dc4dc

tw4l added 28 commits December 3, 2024 16:48

Add missing User import

1f8da62

Fix StorageOps in operator main

9874159

Post-rebase fixups and remove create bucket fallback

4259ee3

Creating a bucket in the verification stage for adding custom storages if it didn't exist was useful for testing but is an anti-pattern for production, so we remove it here.

Create extra test buckets in CI

41a714b

Add test for non-verified custom storage

d36a918

Refactor to move updates to FastAPI background tasks

24d114f

Include default replicas in /storage response if no org replicas

6d62e0a

Fix unsetting of presigned URLs

7b27355

Add --progress flag to rclone copy command

1bd81b1

Increase ttl seconds after finished for testing on dev

b33b3cb

Ensure there are no double slashes between bucket name and oid

2429a00

Increase memory limit/request for copy job to 500Mi

43e7388

Reduce copy job ttlSecondsAfterFinished to 60

c5020c5

Add storage tag to API endpoints

be36e53

Add flags to rclone to reduce memory usage, set limit to 350Mi

b358f73

Fix positional operator in storage ref update

a6ee669

One more positional operator fix

8bc609b

Update docstrings and comments

346dfa6

Make all-storages response valid JSON with response model

6e439c0

Add admin docs for storage

1475a6d

Fix API endpoint path in docs example

cc372de

Docs typo fix

474b258

Add provider field note

8f7376a

Docs language cleanup

af50a25

Check /all-storages in backend tests

1a9177c

Add API endpoint for background job progress

f23526e

Fix linting

a867411

tw4l force-pushed the issue-578-custom-storage branch from c5e88e3 to a867411 Compare December 3, 2024 21:54

Format post-rebase with Black

ee2b74d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement custom storage for orgs #2093

Implement custom storage for orgs #2093

tw4l commented Sep 30, 2024 •

edited

Loading

Implement custom storage for orgs #2093

Are you sure you want to change the base?

Implement custom storage for orgs #2093

Conversation

tw4l commented Sep 30, 2024 • edited Loading

Adds

Notes

TODO

tw4l commented Sep 30, 2024 •

edited

Loading