Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency improvement targeting Collection and Upload #632

Open
yuanzhou opened this issue Mar 5, 2024 · 1 comment
Open

Efficiency improvement targeting Collection and Upload #632

yuanzhou opened this issue Mar 5, 2024 · 1 comment
Assignees
Labels
P Pitt dev team

Comments

@yuanzhou
Copy link
Member

yuanzhou commented Mar 5, 2024

Collection and Upload index procedure is very different from other entity types, similar to each other though.

Collection.datasets and Upload.datasets are both generated by on_read_trigger. This can be time-consuming when a collection has lots datasets. For instance, 3ae4ddfc175d768af5526a010bfe95aa has 211 datasets, the GET request takes 8 seconds to generate a 3.6MB payload.

Collection:

  • Rename Collection.dataset_uuids (currently used by POST and PUT methods) to Collection.member_uuids (Ingest Portal will need to use this new field for Generic UI to create Collections ingest-ui#1377). Also update the trigger method to use this new field.
  • Remove Collection.datasets. And add a new on_read_trigger and on_index_trigger (used by the specialized/documents/<id> endpoint) to the same Collection.member_uuids so we'll only return a list of uuids (requires to update the neo4j query and corresponding search-api) for the GET call. Also mark it as indexed: true.

Upload:

  • Do NOT change or rename any of the existing fields.
  • Add a new field Upload.dataset_uuids with on_read_trigger and on_index_trigger to only return a list of uuids. Also mark it as indexed: true. We'll use this Upload.dataset_uuids to replace Upload.dataset_uuids_to_link and Upload.datasets in the search index procedure, for now. Will require other teams to make this switch LATER.
@yuanzhou yuanzhou added the P Pitt dev team label Mar 5, 2024
@yuanzhou yuanzhou changed the title Additional efficiency improvement targeting Collection and Upload Efficiency improvement targeting Collection and Upload Apr 15, 2024
@yuanzhou
Copy link
Member Author

yuanzhou commented May 6, 2024

Move to backlog

@shirey shirey added this to Pitt HIVE Jun 7, 2024
@shirey shirey moved this to Backlog in Pitt HIVE Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P Pitt dev team
Projects
Status: Backlog
Development

No branches or pull requests

2 participants